Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More API updates for 2.0 #10317

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions include/rdma/fabric.h
Original file line number Diff line number Diff line change
Expand Up @@ -158,14 +158,15 @@ typedef struct fid *fid_t;
#define FI_MATCH_COMPLETE (1ULL << 31)

#define FI_PEER_TRANSFER (1ULL << 36)
#define FI_MR_DMABUF (1ULL << 40)
/* #define FI_MR_DMABUF (1ULL << 40) */
#define FI_AV_USER_ID (1ULL << 41)
#define FI_PEER (1ULL << 43)
/* #define FI_XPU_TRIGGER (1ULL << 44) */
#define FI_HMEM_HOST_ALLOC (1ULL << 45)
#define FI_HMEM_DEVICE_ONLY (1ULL << 46)

#define FI_TAGGED_DIRECTED_RECV (1ULL << 45)
#define FI_TAGGED_MULTI_RECV (1ULL << 46)
#define FI_HMEM (1ULL << 47)
/* #define FI_VARIABLE_MSG (1ULL << 48) */
#define FI_EXACT_DIRECTED_RECV (1ULL << 48)
#define FI_RMA_PMEM (1ULL << 49)
#define FI_SOURCE_ERR (1ULL << 50)
#define FI_LOCAL_COMM (1ULL << 51)
Expand Down
8 changes: 8 additions & 0 deletions include/rdma/fi_domain.h
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,12 @@ struct fid_av {
* Tracks registered memory regions, primarily for remote access,
* but also for local access until we can remove that need.
*/

#define FI_MR_DMABUF (1ULL << 40)
#define FI_MR_SINGLE_USE (1ULL << 41)
#define FI_HMEM_HOST_ALLOC (1ULL << 45)
#define FI_HMEM_DEVICE_ONLY (1ULL << 46)

struct fid_mr {
struct fid fid;
void *mem_desc;
Expand Down Expand Up @@ -176,6 +182,8 @@ struct fi_mr_attr {
} device;
void *hmem_data;
size_t page_size;
const struct fid_mr *base_mr;
size_t sub_mr_cnt;
};

struct fi_mr_modify {
Expand Down
5 changes: 3 additions & 2 deletions man/fi_av.3.md
Original file line number Diff line number Diff line change
Expand Up @@ -384,8 +384,9 @@ Upon successful insert with FI_AUTH_KEY flag, the returned fi_addr_t's will map
endpoint address against the specified authorization keys. These fi_addr_t's can be
used as the target for local data transfer operations.

If the endpoint supports `FI_DIRECTED_RECV`, these fi_addr_t's can be used to
restrict receive buffers to a specific endpoint address and authorization key.
If the endpoint supports `FI_DIRECTED_RECV` or `FI_TAGGED_DIRECTED_RECV`, these
fi_addr_t's can be used to restrict receive buffers to a specific endpoint address
and authorization key.

For address vectors configured with FI_AV_USER_ID, all subsequent target events
corresponding to the address being inserted will return FI_ADDR_NOTAVAIL until
Expand Down
1 change: 1 addition & 0 deletions man/fi_endpoint.3.md
Original file line number Diff line number Diff line change
Expand Up @@ -1340,6 +1340,7 @@ capability bits from the fi_info structure will be used.
The following capabilities apply to the receive attributes: FI_MSG,
FI_RMA, FI_TAGGED, FI_ATOMIC, FI_REMOTE_READ, FI_REMOTE_WRITE, FI_RECV,
FI_HMEM, FI_TRIGGER, FI_RMA_PMEM, FI_DIRECTED_RECV,
FI_TAGGED_DIRECTED_RECV, FI_TAGGED_MULTI_RECV,
FI_MULTI_RECV, FI_SOURCE, FI_RMA_EVENT, FI_SOURCE_ERR, FI_COLLECTIVE,
and FI_XPU.

Expand Down
24 changes: 20 additions & 4 deletions man/fi_getinfo.3.md
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,17 @@ additional optimizations.
capability is not set, then the src_addr parameter for msg and tagged
receive operations is ignored.

*FI_TAGGED_DIRECTED_RECV*
: Similar to FI_DIRECTED_RECV, but only applies to tagged receive
operations.

*FI_EXACT_DIRECTED_RECV*
: Similar to FI_DIRECTED_RECV, but requires the source address to be
exact, i.e., FI_ADDR_UNSPEC is not allowed. This capability can
be used alone, or in conjunction with FI_DIRECTED_RECV or
FI_TAGGED_DIRECTED_RECV as a modifier to disallow FI_ADDR_UNSPEC
being used as the source address.

*FI_FENCE*
: Indicates that the endpoint support the FI_FENCE flag on data
transfer operations. Support requires tracking that all previous
Expand Down Expand Up @@ -333,6 +344,10 @@ additional optimizations.
: Specifies that the endpoint must support the FI_MULTI_RECV flag when
posting receive buffers.

*FI_TAGGED_MULTI_RECV*
: Specifies that the endpoint must support the FI_MULTI_RECV flag when
posting tagged receive buffers.

*FI_NAMED_RX_CTX*
: Requests that endpoints which support multiple receive contexts
allow an initiator to target (or name) a specific receive context as
Expand Down Expand Up @@ -462,14 +477,15 @@ may optionally report non-selected secondary capabilities if doing so
would not compromise performance or security.

Primary capabilities: FI_MSG, FI_RMA, FI_TAGGED, FI_ATOMIC, FI_MULTICAST,
FI_NAMED_RX_CTX, FI_DIRECTED_RECV, FI_HMEM, FI_COLLECTIVE, FI_XPU,
FI_AV_USER_ID, FI_PEER
FI_NAMED_RX_CTX, FI_DIRECTED_RECV, FI_TAGGED_DIRECTED_RECV, FI_HMEM,
FI_COLLECTIVE, FI_XPU, FI_AV_USER_ID, FI_PEER

Primary modifiers: FI_READ, FI_WRITE, FI_RECV, FI_SEND,
FI_REMOTE_READ, FI_REMOTE_WRITE

Secondary capabilities: FI_MULTI_RECV, FI_SOURCE, FI_RMA_EVENT, FI_SHARED_AV,
FI_TRIGGER, FI_FENCE, FI_LOCAL_COMM, FI_REMOTE_COMM, FI_SOURCE_ERR, FI_RMA_PMEM.
Secondary capabilities: FI_MULTI_RECV, FI_TAGGED_MULTI_RECV, FI_SOURCE,
FI_RMA_EVENT, FI_SHARED_AV, FI_TRIGGER, FI_FENCE, FI_LOCAL_COMM,
FI_REMOTE_COMM, FI_SOURCE_ERR, FI_RMA_PMEM.

# MODE

Expand Down
45 changes: 43 additions & 2 deletions man/fi_mr.3.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,14 @@ attributes (mr_mode field). Each mr_mode bit requires that an
application take specific steps in order to use memory buffers with
libfabric interfaces.

As a special case, a new memory region can be created from an existing
memory region. Such a new memory region is called a sub-MR, and the existing
memory region is called the base MR. Sub-MRs may be used to shared hardware
resources, such as virtual to physical address translations and page pinning.
This can improve performance when creating and destroying sub-regions that
need different access rights. The base MR itself can also be a sub-MR,
allowing for a hierarchy of memory regions.

The following apply to memory registration.

*Default Memory Registration*
Expand Down Expand Up @@ -575,8 +583,8 @@ into calls as function parameters.
```c
struct fi_mr_attr {
union {
const struct iovec *mr_iov;
const struct fi_mr_dmabuf *dmabuf;
const struct iovec *mr_iov;
const struct fi_mr_dmabuf *dmabuf;
};
size_t iov_count;
uint64_t access;
Expand All @@ -595,6 +603,8 @@ struct fi_mr_attr {
} device;
void *hmem_data;
size_t page_size;
const struct fid_mr *base_mr;
size_t sub_mr_cnt;
};

struct fi_mr_auth_key {
Expand Down Expand Up @@ -810,6 +820,31 @@ or from the region.
Providers may choose to ignore page size. This will result in a provider
selected page size always being used.

## base_mr

If non-NULL, create a sub-MR from an existing memory region specified by
the base_mr field.

The sub-MR must be fully contained within the base MR; however, the sub-MR
has its own authorization keys and access rights. The following attributes
are inherited from the base MR, and as a result, are ignored when creating the
sub-MR:

iface, device, hmem_data, page_size

The sub-MR should hold a reference to the base MR. When fi_close is called
on the base MR, the call would fail if there are any outstanding sub-MRs.

The base_mr field must be NULL if the FI_MR_DMABUF flag is set.

## sub_mr_cnt

The number of sub-MRs expected to be created from the memory region. This
value is not a limit. Instead, it is a hint to the provider to allow provider
specific optimization for sub-MR creation. For example, the provider may
reserve access keys or pre-allocation fid_mr objects. The provider may
ignore this hint.

## fi_hmem_ze_device

Returns an hmem device identifier for a level zero <driver, device> tuple.
Expand Down Expand Up @@ -900,6 +935,12 @@ The follow flag may be specified to any memory registration call.
fi_mr_attr structure. This flag is only usable for domains opened with
FI_HMEM capability support.

*FI_MR_SINGLE_USE*
j-xiong marked this conversation as resolved.
Show resolved Hide resolved
: This flag indicates that the memory region is only used for a single
operation. After the operation is complete, the key associated with the
memory region is automatically invalidated and can no longer be used for
remote access.

*FI_AUTH_KEY*
: Only valid with domains configured with FI_AV_AUTH_KEY. When used with
fi_mr_regattr, this flag denotes that the fi_mr_auth_key::src_addr field
Expand Down
5 changes: 5 additions & 0 deletions man/fi_msg.3.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,11 @@ to write CQ entries for all successful completions. See the flags
discussion below for more details. The requested message size that
can be used with fi_inject is limited by inject_size.

If FI_HMEM is enabled, the fi_inject call can only accept buffer with
iface equal to FI_HMEM_SYSTEM if the provider requires the FI_MR_HMEM
mr_mode. This limitation applies to all the fi_\*inject\* calls and
does not affect how inject_size is reported.

## fi_senddata

The send data call is similar to fi_send, but allows for the sending
Expand Down
24 changes: 21 additions & 3 deletions man/fi_tagged.3.md
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,24 @@ and/or fi_tsendmsg.
local buffer and transfer out of that buffer. This flag can only
be used with messages smaller than inject_size.

*FI_MULTI_RECV*
: Applies to posted tagged receive operations when the FI_TAGGED_MULTI_RECV
capability is enabled. This flag allows the user to post a single
tagged receive buffer that will receive multiple incoming messages.
Received messages will be packed into the receive buffer until the
buffer has been consumed. Use of this flag may cause a single
posted receive operation to generate multiple events as messages are
placed into the buffer. The placement of received data into the
buffer may be subjected to provider specific alignment restrictions.

The buffer will be released by the provider when the available buffer
space falls below the specified minimum (see FI_OPT_MIN_MULTI_RECV).
Note that an entry to the associated receive completion queue will
always be generated when the buffer has been consumed, even if other
receive completions have been suppressed (i.e. the Rx context has been
configured for FI_SELECTIVE_COMPLETION). See the FI_MULTI_RECV
completion flag [`fi_cq`(3)](fi_cq.3.html).

*FI_INJECT_COMPLETE*
: Applies to fi_tsendmsg. Indicates that a completion should be
generated when the source buffer(s) may be reused.
Expand Down Expand Up @@ -292,9 +310,9 @@ and/or fi_tsendmsg.

*FI_AUTH_KEY*
: Only valid with domains configured with FI_AV_AUTH_KEY and connectionless
endpoints configured with FI_DIRECTED_RECV. When used with fi_trecvmsg, this
flag denotes that the src_addr is an authorization key fi_addr_t instead of
an endpoint fi_addr_t.
endpoints configured with FI_DIRECTED_RECV or FI_TAGGED_DIRECTED_RECV. When
used with fi_trecvmsg, this flag denotes that the src_addr is an authorization
key fi_addr_t instead of an endpoint fi_addr_t.

The following flags may be used with fi_trecvmsg.

Expand Down