diff --git a/NEWS b/NEWS index 13006f2ee90..86a3025f6fc 100644 --- a/NEWS +++ b/NEWS @@ -11,53 +11,7 @@ ### Features: ### Bugfixes: -## 1.15.0-rc6 (September 20, 2023) -### Bugfixes: -#### UCP -* Fixed assertion when sending from noncontig GPU buffer to managed buffer. - -## 1.15.0-rc5 (September 12, 2023) -### Bugfixes: -#### UCP -* Fixed the data race on endpoint configurations. - -## 1.15.0-rc4 (August 30, 2023) -### Bugfixes: -#### RDMA CORE (IB, ROCE, etc.) -* Fixed dma-buf based memory region registration -* Fixed memory handle data corruption when PCIe relaxed ordering is enabled -#### UCS -* Fixed lane selection, adding bandwidth estimation for Sapphire Rapids family - -## 1.15.0-rc3 (August 8, 2023) -### Bugfixes: -#### UCP -* Fixed endpoint reconfiguration issues because of assymetrical selection -#### UCT -* Check dmabuf kernel support in ROCm memory domain -#### UCM -* Fixed conditional jump patching -#### Tools - * Fixed memory access flags in perftest - -## 1.15.0-rc2 (July 27, 2023) -### Features: -#### RDMA CORE (IB, ROCE, etc.) -* Implemented is_reachable_v2 for IB interfaces -#### Build -* Enabled build with binutils 2.40 -* Added versioned dependency to switch between packages with the same names - -### Bugfixes: -#### UCP -* Fixed endpoint reconfiguration error due to wrong locality detection -#### RDMA CORE (IB, ROCE, etc.) -* Fixed performance degradation when indirect atomic key is not supported by the hardware -* Fixed remote access error to strict-order key because of wrong offset -#### GPU (CUDA, ROCM) -* Fixed CUDA IPC performance degradation after libnuma removal - -## 1.15.0-rc1 (May 10, 2023) +## 1.15.0 (September 28, 2023) ### Features: #### UCP * Added 2-stage pipeline protocol in the new protocol infrastructure @@ -75,6 +29,7 @@ * Added base implementation of is_reachable_v2 API using intra/inter flag * Introduced MD capability for non-blocking registration memory types #### RDMA CORE (IB, ROCE, etc.) +* Added implementation of is_reachable_v2 routine to IB interface * Added option to control CQE zipping per CQ RX/TX direction * Added option to specify how DCI selects port under RoCE LAG * Added hw_dcs to the list of policies to select DCI by an endpoint @@ -104,12 +59,17 @@ * Added user-side memcpy option for AM benchmarks in ucx_perftest * Added wireshark LUA dissectors for some UCX protocols #### Build +* Added support for binutils 2.40 +* Added versioned dependency to switch between packages with the same names * Added a separate xpmem deb subpackage * Added aarch64 support to the binary distribution pipeline * Removed dependency on libnuma - ### Bugfixes: #### UCP +* Fixed assertion when sending from non-contiguous GPU buffer to managed buffer +* Fixed the race condition on endpoint configurations +* Fixed endpoint reconfiguration issues due to asymmetrical selection +* Fixed endpoint reconfiguration error due to wrong locality detection * Fixed crash during connection manager cleanup * Fixed rkey index calculation for rendezvous protocol * Fixed rcache dump function @@ -123,20 +83,29 @@ * Fixed CPU/device atomics selection in the new protocol infrastructure * Multiple fixes in the new protocol infrastructure information output #### UCT +* Added check for dmabuf kernel support in ROCm memory domain * Fixed exported memh packing * Fixed an error in checking return status of multi-threaded memory registration function #### RDMA CORE (IB, ROCE, etc.) +* Fixed dma-buf based memory region registration +* Fixed memory handle data corruption when PCIe relaxed ordering is enabled +* Fixed performance degradation when indirect atomic key is not supported by the hardware +* Fixed remote access error to strict-order keys because of wrong offset * Added check for UAR support to memory domain opening * Fixed updating port counters for devx qp * Fixed ibv_create_cq error message on node without Infiniband * Fixed performance degradation due to using 2 paths on NDR400 by default * Removed unnecessary async lock which otherwise would block UD progress +#### GPU (CUDA, ROCM) +* Fixed CUDA IPC performance degradation due to libnuma removal #### UCS +* Fixed lane selection and added bandwidth estimation for Sapphire Rapids family * Fixed displaying wrong environment variable suggestions * Fixed VFS warning output * Fixed SEGV in ucs_debug_backtrace_next(), upon previous SEGV handling, due to ENOMEM situation * Fixed memory corruption when using UCX_MPOOL_FIFO=y #### UCM +* Fixed conditional jump patching * Fixed mremap() override #### GPU (CUDA, ROCM) * Fixed usage of dmabuf when the buffer is not page-aligned @@ -148,6 +117,7 @@ #### Tests * Fixed wrong usage of ep_close in examples #### Tools +* Fixed memory access flags in perftest * Removed support for librte from perf * Fixed worker flush deadlock when using multiple workers in ucx_perftest #### Build