Skip to content
shamisp edited this page Oct 28, 2014 · 11 revisions

WIP:

  • Tag matching API for UCP
  • Implement RMA on UCP
  • Create PD independently, use it to create iface (needed: uGNI PoC)
  • Performance tests for UCP
  • SIDR connection establishment
  • UCP bootstrap - use one transport to bootstrap others.
  • Add worker API
  • Implement UCT AM callback which holds reference to the message.
  • When cannot initiate the operation, UCT would return either NO_EP_RESOURCES or NO_IFACE_RESOURCES.
  • Add more allocators for TL buffers (huge pages, mmap, ...)
  • Rename uct_lkey_t to uct_mem_region_t

API features:

  • Flags for communication: solicited event, completion,...
  • Advertise required alignment for operations and best-performance alignment for operations
  • Make sure communication can be initiated from callbacks.
  • Pass configuration to UCP_CONTEXT
  • Add timers support for async API

IB features:

  • RoCE
  • RRoCE (GID index)
  • Path Query (RDMA CM / IB CM)
  • LMC
  • Non-default P_Key index
  • SL

Design improvements:

  • Extract more IB common code
  • Move 'stats' library under 'tools'
  • Inconsistency with atomic/get bcopy API: in case the transport completes the operation immediately (e.g mmap), it should still call the callback. which means callbacks are called from communication functions, which means communication functions cannot be called from callbacks..
  • const correctness
  • Move 'perf' library under 'tools'

Usability/debug improvements:

  • In debug mode - check that EP is connected before sending
  • Log by categories/objects
  • All configuration variables should begin with UCX_
  • Support custom env prefix
  • Dump statistics to shared memory / unix socket.
  • Check for constant_tsc bit, and take CPU frequency from sysfs instead of procinfo.
  • Add doxygen
  • In ucx_perftest, use PMI/librte instead of MPI

Performance improvements:

  • Separate rx/tx progress
  • likely/unlikely

Tests:

  • Bidirectional tests
  • Performance tests with multiple nodes (e.g pairs, all2all)
  • Performance test should take expected performance from resource capabilities.
  • Count warnings during gtest, and fail the test if they happen
  • Print warning from perftest if not running with optimal performance:
    • not in release mode
  • RTE support in gtest - maybe not needed; uGNI supports loopback.
  • AM message rate/bandwidth
  • Check capability flags in tests
  • in p2p_test, define sender_entity and receiver_entity

RC:

  • Don't use descriptor in atomic add - pass a global /dev/null buffer.
  • Use scatter-to-CQ for atomic/get replies
  • Handle SRQ watermark event.
  • Remove RC EP's from the hash table when they are removed (refcount)
  • Get rid of RC iface counters. instead have an array will all ep's which have pending sends. This should make flush operation faster.
  • Update callbacks API
  • Statistics for RC.
  • Configure all RC QP parameters.
  • Parameter checks in debug mode.
  • Log data packets in RC.
  • Allocate and fill descriptor only after making sure there are send resources.
  • Update bw/latency for transports.
  • Construct WQE with SSE.
  • Performance tests for GET and Atomics.
  • Inline sends with >1 WQEBB.
  • Use NOP for flush
  • Handle async events in IB and print full information
  • Separate parameter for send CQ size.
  • Check for send CQ resources.
  • Normalize transport names
  • GET support
  • Scatter-to-CQ for 64 bytes
  • Atomic operations
  • Avoid queuing 2 callbacks for some flows (atomic add, am zcopy)

Autoconf:

  • -libverbs is added to LIBS (global)
  • -libcm is added to LIBS
  • HAVE_TL_xx in automake can be on, even if HAVE_IB is off
Clone this wiki locally