Skip to content

Commit

Permalink
RELEASE: Updated NEWS for 1.17
Browse files Browse the repository at this point in the history
  • Loading branch information
shasson5 committed Jun 13, 2024
1 parent 70bf3cb commit ca89195
Showing 1 changed file with 92 additions and 0 deletions.
92 changes: 92 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,98 @@
### Features:
### Bugfixes:

## 1.17.0 (June 13, 2024)
### Features:
#### UCP
* Improved the accuracy of rendezvous protocol performance estimation
* Enabled short protocol for non-host memory types on empty messages
* Improved the accuracy of performance estimation for empty messages by removing non-relevant overheads
* Added RMA_ZCOPY_MAX_SEG_SIZE configuration parameter to allow modifying segment size for RMA-ZCOPY protocols
* Added support for separate intra/inter-node rendezvous thresholds
* Added support for minimal fragment size in rendezvous protocol
* Added support for resetting request during send operation
* Added UCX_PROTO_OVERHEAD configuration variable to allow setting protocol overheads
* Improved performance for combined Active Message/RMA scenarios by separating them to different lanes
* Added support for device staging buffers in pipeline protocols
* Enabled on-demand paging for Nvidia's Grace platforms by default
#### RDMA CORE (IB, ROCE, etc.)
* Introduced the UCX_REVERSE_SL environment variable to configure reverse SL for DC transport. By default, it uses UCX_IB_SL.
* Added support for GID auto-detection in Floating LID based routing
* Added support for multithreading KSM registration of unaligned buffers
* Added IB_SEND_OVERHEAD and MM_[SEND|RECV]_OVERHEAD configuration variables
#### GPU (CUDA, ROCM)
* Added support for oneAPI Level-Zero library for Intel GPUs
#### UCS
* Added support for rcache dynamic region alignment
* Added dynamic bitmap data structure
* Added support for advanced key-value parsing for UCX configuration
* Added piecewise linear function data structure
* Added support for allocating dynamic arrays on stack
#### Tools
* Added support for device memory allocation in UCX perftest
* Added a script to use for squashing commits after PR approval
* Added support for DPU cross-gvmi daemon in UCX perftest
#### Java
* Added support for EP local socket address API in JUCX
#### Build
* Added address sanitizer support
* Added a helper shell script to run static checks
#### AZP
* Replaced Valgrind tests with address sanitizer tool
* Added Ubuntu 22.04 docker image testing
#### Configuration
* Added support for filtering configuration sections by platform type
* Added configuration file with section for Grace Hopper
### Bugfixes:
#### UCP
* Fixed crash due to incorrect lane selection when active message is disabled
* Fixed RMA lane selection issue due to wrong bandwidth calculation
* Fixed rendezvous protocol information in protocol details table
* Fixed endpoint reconfiguration issue due to wrong bandwidth calculation
* Fixed Active Message handlers issue due to out of order registration
* Fixed registration of memh evens for imported memory key
* Fixed sockaddr unreachable destination error handling
* Fixed uninitialized memory issue in new protocols infrastructure
* Fixed race condition when using strong fence by flushing all endpoints
* Fixed incorrect RMA message size on immediate completion with no datatype
* Fixed incorrect performance estimation due to fp8 pack/unpack issue
* Fixed remote access error when rcache memory is not registered with atomic access
* Fixed assertion failure when rcache fails during memh allocation
* Fixed atomic device selection issue
* Fixed worker interface deactivation while still in use by endpoints
* Fixed wire compatibility issue due to mismatched lane selection
#### RDMA CORE (IB, ROCE, etc.)
* Disabled device memory if atomics are not available
* Fixed indirect keys creation for MT registered memory
* Fixed KSM start address value when creating export key
* Fixed DCI pool index to support maximum of 16 pools
* Fixed atomic rkey issue when using imported memory
* Fixed crash due to unsupported SRQ capability
#### GPU (CUDA, ROCM)
* Removed unused environment variable RCACHE_ADDR_ALIGN from ROCm transport
* Fixed usage of cuda device 0 when no context is active
* Removed error handling support from CUDA IPC transport
* Fixed allocation of unaligned CUDA memory
#### Shared Memory
* Fixed occasional crash when shm_unlink fails during interface initialization
#### UCS
* Fixed system device distance calculation for devices on different PCIe root
* Fixed support for large size arrays in ucs_array
* Fixed synchronization issue in rcache
* Fixed uninitialized variable access in rcache
#### Tests
* Fixed test failures when GPU is present but disabled
* Fixed Active Message hanging issue in ucp_client_server
* Fixed potential crash due to redundant munmap call in ucp mmap tests
* Fixed a crash when running CUDA gtest under valgrind
* Fixed UD endpoint timeout issue under Valgrind
#### Java
* Fixed failures in Java tests by waiting for send requests completion
* Fixed JVM segfault in Java tests when gdrcopy driver is not loaded
* Fixed go build and go tests failures
#### Packaging
* Disabled Go bindings in Debian package

## 1.16.0 (April 15, 2024)
### Features:
#### UCP
Expand Down

0 comments on commit ca89195

Please sign in to comment.