Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEWS: News update before release #6233

Merged
merged 1 commit into from
Feb 2, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 152 additions & 8 deletions NEWS
Original file line number Diff line number Diff line change
@@ -1,18 +1,162 @@
#
## Copyright (C) Mellanox Technologies Ltd. 2001-2020. ALL RIGHTS RESERVED.
## Copyright (C) Mellanox Technologies Ltd. 2001-2021. ALL RIGHTS RESERVED.
## Copyright (C) UT-Battelle, LLC. 2014-2019. ALL RIGHTS RESERVED.
## Copyright (C) ARM Ltd. 2017-2020. ALL RIGHTS RESERVED.
## Copyright (C) ARM Ltd. 2017-2021. ALL RIGHTS RESERVED.
##
## See file LICENSE for terms.
##
#

## Current
### Features: TBD
#### UCX Core
- Added ucp_tag_msg_recv_nbx routine.
#### UCX Java (API Preview) TBD
### Bugfixes: TBD
## 1.10.0-rc2 (February 2, 2021)
### Features:
#### Core
* Added support for Nvidia HPC SDK
* Added support for latest PGI and Clang
* Added support for ROCM-3.7+ (warning generated if older version detected)
#### Architecture
* Added Arm SVE memcpy()
* Redesigned Arm WFE support
* Improved clear_cache performance for Arm
* Added architecture detection for Zhaoxin CPU
#### CI
* Added release builds on CUDA 11
* Enabled performance validation in gtest
#### UCP
yosefe marked this conversation as resolved.
Show resolved Hide resolved
* Added locality awareness to the transport selection logic for GPU devices
* Added put/offload/short and put/offload/zcopy protocols
* Added receive message nbx routine
* Reworked AM implementation and API, which adds support for RNDV semantics
* Added support for multi-lane connection manager over TCP
* Added support for printing AM tls with info log level
* Implement flush and destroy for UCT EPs on UCP worker
* Reduced UCP request size
* Added support for keepalive protocol
* Added support for multi-fragment protocol
* Added implementation for protocol progress for eager, bcopy, and multicopy
* Improved selection logic for protocol selection
* Added new protocols for UCP get operation
* Added bcopy protocols with support for GPU memory
* Added RNDV protocol implementation for GPU devices (CUDA, ROCm)
* Set SOCKADDR_CM_ENABLE=y by default
* Added support for fast-path short with new tag protocols
* Added a new parameter to control the CM listener's backlog
* Added support sending AM RTS over short message protocol
* Added support for shared memory multi-lane when CM is used
#### UCT
* Added API for keepalive_timeout value
* Added add uct_completion.status
* Allowed transports to access multiple mem_types
* Removed status arg from uct_completion_callback_t
* Restructured uct_mem_alloc/uct_md_mem_alloc to use mem_type
* Updated documentation for uct_listener_params
* Lowered the log level for certain network errors
* Added cuda_copy wakeup feature
* Added wakeup support for shared memory
#### UCS
* Added "inf" and "auto" values to time units
* Added on-stack constructors for array and string buffer
* Added ucs_ptr_map_t data structure
* Added bool CSWAP
* Improved logging
* Added optimization for namespace processing
* Fixes for connection matching functionality
#### RDMA CORE (IB, ROCE, etc.)
* Added support for auto detection of adapative routing settings
* Added an option to poll TX CQ every progress iteration
yosefe marked this conversation as resolved.
Show resolved Hide resolved
* Added local and remote addresses to the reject error message
* Added support for UAR allocation with non-cacheable memory type
* Added support for multiple flush cancel without completion
* Added async events callback support
* Added detection for ConnectX-6, ConnectX-7 and BlueField-1/2 devices
* Added support for connection matching for UD
* Added a check for AM ordering
#### Java (preview)
* Added support for a different javadoc executable path for different java versions
* Added UCS memory type constants
* Added support build on Java10+
* Added support for io-vector datatype.
#### Tests
* Added CI for CUDA 11
* Added test_ucp_sockaddr_protocols.stream_short
* Reimplemented tests using NBX API
* Added flush(cancel) test
* Added memory_wait mode to perftest
* Added support for clang 10
* Refactored RMA and atomic tests, add memtype support
* Added test for uct_md_mem_query()
* Added request interrupt support
* Added support for connection manager fallbacks
* Added new ucp request test checking for leaks from the ptr_map
#### Documentation
* Added glossaries

### Bugfixes:
#### Portability
* Fixes in print functions to use format string like PRIx64, etc.
#### Continues Integration:
* Fixes in Github release flow
* Fixes in docker image
#### Packaging
* Removed deb package dependencies
* Fixes in SPEC to make the RPM relocatable
#### Documentation
* Fixes in documentation for ucp_am_recv_data_nbx
* Fixes in quick start example
* Fixes in installation instruction
#### Tests
* Fixes for failures under valgrind runtime
* Fixes in mmap tests for 0-length RMA
* Fixes in definition of LAST_WQE wait timeout
* Fixes in ROCm for mem_buffer test
* Fixes in test name printing format
* Fixes in tcp_sockcm test
#### UCP
* Fixes in worker cleanup flow
#### CUDA
* Fixes in managed memory support
#### RDMA CORE (IB, ROCE, etc.)
* Fixes in assert definitions
* Fixes in printing an error about invalid AM Bcopy length for UD
* Fixes for thread safety support
* Fixes to get ROCE device name according to GID
* Fixes for SL selection
* Fixes in create STRICT_ORDER key
* Fixes addressing performance degradation in UD transport due to excess async events
#### UGNI
* Fixing disable logic in config
* Fixing clang 11 warnings
#### Java
* Fixes in build dependencies
* Fixes in constructing UcpRequest object on error
* Fixes in exception handling on endpoint closure request
* Fixes for segfault in UcpErrorHandler
#### UCP
yosefe marked this conversation as resolved.
Show resolved Hide resolved
* Fixes in datatype support for get_zcopy RNDV
* Fixes in connection manager disconnect
* Fixes in assert definitions
* Fixes in completion flow for failed EP
* Fixes in flush error handling flow
* Fixes in latency calculations for wireup protocol
* Fixes in offload completion with inlined data
* Fixes in unpacking flow
* Fixes in error handling for various protocols
#### UCT
yosefe marked this conversation as resolved.
Show resolved Hide resolved
* Fixes in flush TX
* Fixes in checks for enabling GPU Direct RDMA
#### UCS
* Fixes for crashes on incorrect value set in config
* Fixes in ptr_array
* Fixes in maximal size for ucs_snprintf_safe()
* Fixes in compilation warning
* Fixes in ucs_aarch64_dsb(_op) definition
#### TCP
* Fixes in default route interface confirmation flow
* Fixes in PUT protocol
* Fixes in max connection limit and improved error reporting
#### UCM
* Fixing crash on prevent unload
* Fixes in libucm_rocm
* Fixes for few racing conditions

## 1.9.0 (September 19, 2020)
### Features:
Expand Down