From d577ec75b4555c8def9412bde9c53f8182ce3365 Mon Sep 17 00:00:00 2001 From: Raul Akhmetshin Date: Wed, 27 Sep 2023 14:13:19 +0300 Subject: [PATCH 1/4] NEWS: Added 1.15.0 section. --- NEWS | 66 +++++++++++++++++------------------------------------------- 1 file changed, 18 insertions(+), 48 deletions(-) diff --git a/NEWS b/NEWS index 13006f2ee90..a26b27d6410 100644 --- a/NEWS +++ b/NEWS @@ -11,53 +11,7 @@ ### Features: ### Bugfixes: -## 1.15.0-rc6 (September 20, 2023) -### Bugfixes: -#### UCP -* Fixed assertion when sending from noncontig GPU buffer to managed buffer. - -## 1.15.0-rc5 (September 12, 2023) -### Bugfixes: -#### UCP -* Fixed the data race on endpoint configurations. - -## 1.15.0-rc4 (August 30, 2023) -### Bugfixes: -#### RDMA CORE (IB, ROCE, etc.) -* Fixed dma-buf based memory region registration -* Fixed memory handle data corruption when PCIe relaxed ordering is enabled -#### UCS -* Fixed lane selection, adding bandwidth estimation for Sapphire Rapids family - -## 1.15.0-rc3 (August 8, 2023) -### Bugfixes: -#### UCP -* Fixed endpoint reconfiguration issues because of assymetrical selection -#### UCT -* Check dmabuf kernel support in ROCm memory domain -#### UCM -* Fixed conditional jump patching -#### Tools - * Fixed memory access flags in perftest - -## 1.15.0-rc2 (July 27, 2023) -### Features: -#### RDMA CORE (IB, ROCE, etc.) -* Implemented is_reachable_v2 for IB interfaces -#### Build -* Enabled build with binutils 2.40 -* Added versioned dependency to switch between packages with the same names - -### Bugfixes: -#### UCP -* Fixed endpoint reconfiguration error due to wrong locality detection -#### RDMA CORE (IB, ROCE, etc.) -* Fixed performance degradation when indirect atomic key is not supported by the hardware -* Fixed remote access error to strict-order key because of wrong offset -#### GPU (CUDA, ROCM) -* Fixed CUDA IPC performance degradation after libnuma removal - -## 1.15.0-rc1 (May 10, 2023) +## 1.15.0 (September 27, 2023) ### Features: #### UCP * Added 2-stage pipeline protocol in the new protocol infrastructure @@ -75,6 +29,7 @@ * Added base implementation of is_reachable_v2 API using intra/inter flag * Introduced MD capability for non-blocking registration memory types #### RDMA CORE (IB, ROCE, etc.) +* Implemented is_reachable_v2 for IB interfaces * Added option to control CQE zipping per CQ RX/TX direction * Added option to specify how DCI selects port under RoCE LAG * Added hw_dcs to the list of policies to select DCI by an endpoint @@ -104,12 +59,17 @@ * Added user-side memcpy option for AM benchmarks in ucx_perftest * Added wireshark LUA dissectors for some UCX protocols #### Build +* Enabled build with binutils 2.40 +* Added versioned dependency to switch between packages with the same names * Added a separate xpmem deb subpackage * Added aarch64 support to the binary distribution pipeline * Removed dependency on libnuma - ### Bugfixes: #### UCP +* Fixed assertion when sending from noncontig GPU buffer to managed buffer +* Fixed the data race on endpoint configurations +* Fixed endpoint reconfiguration issues because of assymetrical selection +* Fixed endpoint reconfiguration error due to wrong locality detection * Fixed crash during connection manager cleanup * Fixed rkey index calculation for rendezvous protocol * Fixed rcache dump function @@ -123,20 +83,29 @@ * Fixed CPU/device atomics selection in the new protocol infrastructure * Multiple fixes in the new protocol infrastructure information output #### UCT +* Check dmabuf kernel support in ROCm memory domain * Fixed exported memh packing * Fixed an error in checking return status of multi-threaded memory registration function #### RDMA CORE (IB, ROCE, etc.) +* Fixed dma-buf based memory region registration +* Fixed memory handle data corruption when PCIe relaxed ordering is enabled +* Fixed performance degradation when indirect atomic key is not supported by the hardware +* Fixed remote access error to strict-order key because of wrong offset * Added check for UAR support to memory domain opening * Fixed updating port counters for devx qp * Fixed ibv_create_cq error message on node without Infiniband * Fixed performance degradation due to using 2 paths on NDR400 by default * Removed unnecessary async lock which otherwise would block UD progress +#### GPU (CUDA, ROCM) +* Fixed CUDA IPC performance degradation after libnuma removal #### UCS +* Fixed lane selection, adding bandwidth estimation for Sapphire Rapids family * Fixed displaying wrong environment variable suggestions * Fixed VFS warning output * Fixed SEGV in ucs_debug_backtrace_next(), upon previous SEGV handling, due to ENOMEM situation * Fixed memory corruption when using UCX_MPOOL_FIFO=y #### UCM +* Fixed conditional jump patching * Fixed mremap() override #### GPU (CUDA, ROCM) * Fixed usage of dmabuf when the buffer is not page-aligned @@ -148,6 +117,7 @@ #### Tests * Fixed wrong usage of ep_close in examples #### Tools +* Fixed memory access flags in perftest * Removed support for librte from perf * Fixed worker flush deadlock when using multiple workers in ucx_perftest #### Build From 89399fd9a950e3a3b4864f4c22016b117f25b234 Mon Sep 17 00:00:00 2001 From: Raul Akhmetshin Date: Wed, 27 Sep 2023 18:24:30 +0300 Subject: [PATCH 2/4] NEWS: Addressed review comments. --- NEWS | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/NEWS b/NEWS index a26b27d6410..f0a01a98306 100644 --- a/NEWS +++ b/NEWS @@ -29,7 +29,7 @@ * Added base implementation of is_reachable_v2 API using intra/inter flag * Introduced MD capability for non-blocking registration memory types #### RDMA CORE (IB, ROCE, etc.) -* Implemented is_reachable_v2 for IB interfaces +* Added format is_reachable_v2 for IB interfaces * Added option to control CQE zipping per CQ RX/TX direction * Added option to specify how DCI selects port under RoCE LAG * Added hw_dcs to the list of policies to select DCI by an endpoint @@ -59,16 +59,16 @@ * Added user-side memcpy option for AM benchmarks in ucx_perftest * Added wireshark LUA dissectors for some UCX protocols #### Build -* Enabled build with binutils 2.40 +* Added support for binutils 2.40 * Added versioned dependency to switch between packages with the same names * Added a separate xpmem deb subpackage * Added aarch64 support to the binary distribution pipeline * Removed dependency on libnuma ### Bugfixes: #### UCP -* Fixed assertion when sending from noncontig GPU buffer to managed buffer -* Fixed the data race on endpoint configurations -* Fixed endpoint reconfiguration issues because of assymetrical selection +* Fixed assertion when sending from non-contiguous GPU buffer to managed buffer +* Fixed the race condition on endpoint configurations +* Fixed endpoint reconfiguration issues due to assymetrical selection * Fixed endpoint reconfiguration error due to wrong locality detection * Fixed crash during connection manager cleanup * Fixed rkey index calculation for rendezvous protocol @@ -83,23 +83,23 @@ * Fixed CPU/device atomics selection in the new protocol infrastructure * Multiple fixes in the new protocol infrastructure information output #### UCT -* Check dmabuf kernel support in ROCm memory domain +* Added check for dmabuf kernel support in ROCm memory domain * Fixed exported memh packing * Fixed an error in checking return status of multi-threaded memory registration function #### RDMA CORE (IB, ROCE, etc.) * Fixed dma-buf based memory region registration * Fixed memory handle data corruption when PCIe relaxed ordering is enabled * Fixed performance degradation when indirect atomic key is not supported by the hardware -* Fixed remote access error to strict-order key because of wrong offset +* Fixed remote access error to strict-order keys because of wrong offset * Added check for UAR support to memory domain opening * Fixed updating port counters for devx qp * Fixed ibv_create_cq error message on node without Infiniband * Fixed performance degradation due to using 2 paths on NDR400 by default * Removed unnecessary async lock which otherwise would block UD progress #### GPU (CUDA, ROCM) -* Fixed CUDA IPC performance degradation after libnuma removal +* Fixed CUDA IPC performance degradation due to libnuma removal #### UCS -* Fixed lane selection, adding bandwidth estimation for Sapphire Rapids family +* Fixed lane selection and added bandwidth estimation for Sapphire Rapids family * Fixed displaying wrong environment variable suggestions * Fixed VFS warning output * Fixed SEGV in ucs_debug_backtrace_next(), upon previous SEGV handling, due to ENOMEM situation From fbfad7a5764b61a9dfc36ff328d78477c092d0df Mon Sep 17 00:00:00 2001 From: Raul Akhmetshin Date: Thu, 28 Sep 2023 12:22:31 +0300 Subject: [PATCH 3/4] NEWS: Addressed review comments. --- NEWS | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/NEWS b/NEWS index f0a01a98306..6e1b16669dd 100644 --- a/NEWS +++ b/NEWS @@ -29,7 +29,7 @@ * Added base implementation of is_reachable_v2 API using intra/inter flag * Introduced MD capability for non-blocking registration memory types #### RDMA CORE (IB, ROCE, etc.) -* Added format is_reachable_v2 for IB interfaces +* Added implementation of is_reachable_v2 routine to IB interface * Added option to control CQE zipping per CQ RX/TX direction * Added option to specify how DCI selects port under RoCE LAG * Added hw_dcs to the list of policies to select DCI by an endpoint @@ -68,7 +68,7 @@ #### UCP * Fixed assertion when sending from non-contiguous GPU buffer to managed buffer * Fixed the race condition on endpoint configurations -* Fixed endpoint reconfiguration issues due to assymetrical selection +* Fixed endpoint reconfiguration issues due to asymmetrical selection * Fixed endpoint reconfiguration error due to wrong locality detection * Fixed crash during connection manager cleanup * Fixed rkey index calculation for rendezvous protocol From 583791e1e23efa9a0ce74140d7d966781124f89c Mon Sep 17 00:00:00 2001 From: Raul Akhmetshin Date: Thu, 28 Sep 2023 12:22:52 +0300 Subject: [PATCH 4/4] NEWS: Updated date. --- NEWS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/NEWS b/NEWS index 6e1b16669dd..86a3025f6fc 100644 --- a/NEWS +++ b/NEWS @@ -11,7 +11,7 @@ ### Features: ### Bugfixes: -## 1.15.0 (September 27, 2023) +## 1.15.0 (September 28, 2023) ### Features: #### UCP * Added 2-stage pipeline protocol in the new protocol infrastructure