Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RELEASE: Updated NEWS for 1.17 #9957

Merged
merged 1 commit into from
Jun 18, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 92 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,98 @@
### Features:
### Bugfixes:

## 1.17.0 (June 13, 2024)
### Features:
#### UCP
* Improved the accuracy of rendezvous protocol performance estimation
* Enabled short protocol for non-host memory types on empty messages
* Improved the accuracy of performance estimation for empty messages by removing non-relevant overheads
* Added RMA_ZCOPY_MAX_SEG_SIZE configuration parameter to allow modifying segment size for RMA-ZCOPY protocols
* Added support for separate intra/inter-node rendezvous thresholds
* Added support for minimal fragment size in rendezvous protocol
* Added support for resetting request during send operation
* Added UCX_PROTO_OVERHEAD configuration variable to allow setting protocol overheads
* Improved performance for combined Active Message/RMA scenarios by separating them to different lanes
* Added support for device staging buffers in pipeline protocols
* Enabled on-demand paging for Nvidia's Grace platforms by default
#### RDMA CORE (IB, ROCE, etc.)
* Introduced the UCX_REVERSE_SL environment variable to configure reverse SL for DC transport. By default, it uses UCX_IB_SL.
* Added support for GID auto-detection in Floating LID based routing
* Added support for multithreading KSM registration of unaligned buffers
* Added IB_SEND_OVERHEAD and MM_[SEND|RECV]_OVERHEAD configuration variables
#### GPU (CUDA, ROCM)
* Added support for oneAPI Level-Zero library for Intel GPUs
#### UCS
* Added support for rcache dynamic region alignment
* Added dynamic bitmap data structure
* Added support for advanced key-value parsing for UCX configuration
* Added piecewise linear function data structure
* Added support for allocating dynamic arrays on stack
#### Tools
* Added support for device memory allocation in UCX perftest
* Added a script to use for squashing commits after PR approval
* Added support for DPU cross-gvmi daemon in UCX perftest
#### Java
* Added support for EP local socket address API in JUCX
#### Build
* Added address sanitizer support
* Added a helper shell script to run static checks
#### AZP
* Replaced Valgrind tests with address sanitizer tool
* Added Ubuntu 22.04 docker image testing
#### Configuration
* Added support for filtering configuration sections by platform type
* Added configuration file with section for Grace Hopper
### Bugfixes:
#### UCP
* Fixed crash due to incorrect lane selection when active message is disabled
* Fixed RMA lane selection issue due to wrong bandwidth calculation
* Fixed rendezvous protocol information in protocol details table
* Fixed endpoint reconfiguration issue due to wrong bandwidth calculation
* Fixed Active Message handlers issue due to out of order registration
* Fixed registration of memh evens for imported memory key
* Fixed sockaddr unreachable destination error handling
* Fixed uninitialized memory issue in new protocols infrastructure
* Fixed race condition when using strong fence by flushing all endpoints
* Fixed incorrect RMA message size on immediate completion with no datatype
* Fixed incorrect performance estimation due to fp8 pack/unpack issue
* Fixed remote access error when rcache memory is not registered with atomic access
* Fixed assertion failure when rcache fails during memh allocation
* Fixed atomic device selection issue
* Fixed worker interface deactivation while still in use by endpoints
* Fixed wire compatibility issue due to mismatched lane selection
#### RDMA CORE (IB, ROCE, etc.)
* Disabled device memory if atomics are not available
* Fixed indirect keys creation for MT registered memory
* Fixed KSM start address value when creating export key
* Fixed DCI pool index to support maximum of 16 pools
* Fixed atomic rkey issue when using imported memory
* Fixed crash due to unsupported SRQ capability
#### GPU (CUDA, ROCM)
* Removed unused environment variable RCACHE_ADDR_ALIGN from ROCm transport
* Fixed usage of cuda device 0 when no context is active
* Removed error handling support from CUDA IPC transport
* Fixed allocation of unaligned CUDA memory
#### Shared Memory
* Fixed occasional crash when shm_unlink fails during interface initialization
#### UCS
* Fixed system device distance calculation for devices on different PCIe root
* Fixed support for large size arrays in ucs_array
* Fixed synchronization issue in rcache
* Fixed uninitialized variable access in rcache
#### Tests
* Fixed test failures when GPU is present but disabled
* Fixed Active Message hanging issue in ucp_client_server
* Fixed potential crash due to redundant munmap call in ucp mmap tests
* Fixed a crash when running CUDA gtest under valgrind
* Fixed UD endpoint timeout issue under Valgrind
#### Java
* Fixed failures in Java tests by waiting for send requests completion
* Fixed JVM segfault in Java tests when gdrcopy driver is not loaded
* Fixed go build and go tests failures
#### Packaging
* Disabled Go bindings in Debian package

## 1.16.0 (April 15, 2024)
### Features:
#### UCP
Expand Down