Skip to content

Releases: yosefe/ucx

v1.9.0-pre41

09 May 19:15
60314d3
Compare
Choose a tag to compare
v1.9.0-pre41 Pre-release
Pre-release
  • Add support for static build (see https://openucx.readthedocs.io/en/master/faq.html#build-user-application-with-ucx)
  • Add support for different number of RoCE LAG paths on server and client
  • Add support for user-provided memory handle #247
  • Add support for setting local bind address on client endpoint #250
  • Add support for building debug RPM #221
  • Support setting UCT level parameters from UCP API #209
  • Set UCX_SOCKADDR_CM_ENABLE=y by default
  • Disable DevX objects by default #185 #178
  • Fixes in keepalive protocol #243 #205 #196 #184
  • Fixes in rendezvous cancel protocol #233 #230 #227
  • Fix hang in case of device fatal error #231
  • Fix simultaneous disconnect #200
  • Fix low bandwidth with multi-rail eager #194
  • Fix uct_rdmacm_cm_cqs hash key #237
  • Multiple fixes in io_demo test application
  • Logging and memory tracker improvements

v1.9.0-pre40

05 Aug 18:09
644cdae
Compare
Choose a tag to compare
v1.9.0-pre40 Pre-release
Pre-release

Features:

Bugfixes:

  • ea0288b Fix string buffer grow
  • 13d124c Fix err code for rdma_<establish|accept> failure
  • ac26299 Fix error handling in ucp_stream_am_handler
  • 0aef7b3 Fix RC scatter-to-cqe configuration
  • a864933, 472a471 Remove ucp_ep_flush progress callback when endpoint is closed
  • de3247c, 529eb4f Track memory usage by application
  • 271e82a Do not progress rendezvous operation if endpoint already failed
  • 41500a7 Move keepalive to a separate progress callback
  • df2dead Do not enable KA on CM lane
  • 6aeb722 Do not access UCP endpoint after it's destroyed or its error callback has been invoked

v1.9.0-pre39

23 May 13:11
7af459f
Compare
Choose a tag to compare
v1.9.0-pre39 Pre-release
Pre-release

Features:

  • Support random path for RoCE high availability, enable by UCX_IB_ROCE_RANDOM_PATH=y #127
  • IODEMO: Non-blocking connect/disconnect #128 #139

Bugfixes:

  • Fix error handling for AM_ZCOPY #131
  • Fix premature keepalive on unconnected endpoint #133
  • Fix TX_INLINE_RESP=0 behavior on DevX #136
  • Fix a race condition when connecting to sockaddr #137
  • IODEMO: Fix compilation on RH 7.2 #129
  • IODEMO: Fix unknown conn_id #141

v1.9.0-pre38

20 Jan 15:18
311cdd0
Compare
Choose a tag to compare
v1.9.0-pre38 Pre-release
Pre-release

Bugfixes:

  • Fix post_recv called without buffers to post #117
  • Reduce log level of simultaneous error during disconnect #119
  • Fix RoCE LAG detection on MLNX_OFED 5.x #124
  • Process registration cache invalidation queue during progress, to release stale regions #123
  • Fix assertion checks in am_zcopy flow #125
  • Add API to return registration cache information and counters #126

v1.9.0-pre37

16 Dec 11:49
07709df
Compare
Choose a tag to compare
v1.9.0-pre37 Pre-release
Pre-release

Features:

  • Add non-blocking resource cleanup by a background process, enabled by UCX_IB_CLEANUP_THREAD=y
  • Make rdma_cm address/route resolve timeout configurable, instead of always using 1 second. Example: UCX_RDMACM_TIMEOUT=10s

Bugfixes:

  • Fix heap corruption caused by ucm_set_event_handler() in multi-threaded application
  • Fix RoCE LAG detection: take GID index from iface to find associated netdev, instead of always checking gid 0
  • Disable backtrace by default to avoid deadlock with malloc/free
  • Fix leak of listener requests during ucp_listener_destroy()
  • Added lock on rc_iface->eps access, to fix race condition between main thread and pack_cb called from progress thread
  • Use ibv_pd handle instead of device name as CQ hash key, for rdma_cm temporary QP

v1.9.0-pre36

26 Nov 21:49
c7669f9
Compare
Choose a tag to compare
v1.9.0-pre36 Pre-release
Pre-release

Fixes for simultaneous disconnect (peer failure during ep_close):

  • Flush operation failed with assertion due to endpoint changed number of lanes (#90, #91)
  • Assertion failed that not all pending operations were removed, since flush_internal exited prematurely (#92)
  • ep_close operation does not complete, since when ep_flushed returned error the lane was not accounted for (#93)

v1.9.0-pre35

03 Nov 15:34
39586a2
Compare
Choose a tag to compare
v1.9.0-pre35 Pre-release
Pre-release
  • Fix potential race condition when keepalive is sent from main thread, while RC endpoint is created from progress thread, and keepalive tries to use a partially initialized endpoint:
    • UCT/RC: Protect rc_iface->ep_list with a spinlock
    • UCT/RC: Initialize ep->connected=0 before keepalive could run
  • Fix race condition between installing malloc() hooks and IB async event thread, which can lead to segfault during application start

v1.9.0-pre34

27 Oct 11:55
aa41625
Compare
Choose a tag to compare
v1.9.0-pre34 Pre-release
Pre-release
  • Reserve 1 CQ credit for qp-flush NOP, to avoid disabling CQ moderation
  • Fix memory leak in IO demo test
  • Simplify active connection counting in IO demo test

v1.9.0-pre33

23 Oct 09:32
c681d0d
Compare
Choose a tag to compare
v1.9.0-pre33 Pre-release
Pre-release
  • Fix reordering when destroying RC endpoint, due to releasing CQ and RDMA_READ credits
  • Enhance logging in rdma_cm

v1.9-pre32

22 Oct 11:23
fc48291
Compare
Choose a tag to compare
v1.9-pre32 Pre-release
Pre-release
  • Add limit for memory registration cache, configured by UCX_IB_RCACHE_MAX_REGIONS and UCX_IB_RCACHE_MAX_SIZE
  • Fix keepalive protocol - don't send when QP is in INIT state
  • Fix RPM build - missing io_demo installed file