Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More API updates for 2.0 #10317

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

More API updates for 2.0 #10317

wants to merge 7 commits into from

Conversation

j-xiong
Copy link
Contributor

@j-xiong j-xiong commented Aug 19, 2024

Reworked the dynamic MR commit and expand to include more pending 2.0 API changes:

  • core: Move flags only used for memory registration calls to fi_domain.h
  • core: Define flag for single use MR
  • core: Define capability bit for tagged multi receive
  • core: Define capability for tagged message only directed recv
  • core: Define capbility for directed receive without wildcard src_addr
  • core: Introduce Sub-MR
  • man: Clarify FI_HMEM support of inject calls

·

Copy link
Member

@shefty shefty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stopping review to move discussion to PR

man/fi_mr.3.md Outdated Show resolved Hide resolved
include/rdma/fabric.h Outdated Show resolved Hide resolved
include/rdma/fi_domain.h Outdated Show resolved Hide resolved
@shefty
Copy link
Member

shefty commented Aug 20, 2024

This change is kind of like memory windows, but not. It requires the provider support 64-bit keys (or smaller). And it's actually adding 2 concepts together.

FI_MR_SINGLE_USE is really independent and could apply to existing registrations. There's no way for a provider to indicate that it supports single use regions, outside of just trying the flag in a call and seeing if it works. I don't know if this should be a domain capability, but maybe...

The dynamic keys are basically trying to create a new MR object that references the same pinned pages. That's equivalent to a memory window. However, MWs are created by posting to a QP/EP, whereas, this is a MR/domain operation. In either case, the user should be given a fid_mr structure here, not a u64 key. That allows integration with the other fi_mr calls (map, unmap, bind, refresh, enable).

This is probably doable by adding fid_mr to the fi_mr_attr, uhm, somewhere. The user can just create/destroy the extra MRs through the existing calls, with the same capabilities/restrictions that the provider has for other MRs (user or provider selected keys).

I don't know if we would actually need a new capability in the latter case. A provider could always perform a second registration, in which case, saving on the page pinning is simply an optimization.

@j-xiong
Copy link
Contributor Author

j-xiong commented Aug 20, 2024

@shefty Thanks for the feedback. Yes, single use and dynamic key assignment did come as two separate issues but I combined them here with the assumption that single use could be simpler with the key instead of the entire MR. I agree that this mostly equivalent to memory windows. One consideration about the bulk key allocations is that it may have the advantage of avoiding kernel involvement when the key is assigned (in case the provider doesn't allow user selected keys). But how much can be gained from that is unclear. I am going to rework the patch, maybe along the line of getting closer to a MW-like approach.

@shefty
Copy link
Member

shefty commented Aug 21, 2024

Bulk key allocation could be defined as an attribute when the original MR is created. I agree that a single use MR is less useful than a single use window/key.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
@j-xiong j-xiong changed the title core: Introduce MR with dynamic keys API updated for 2.0 Sep 30, 2024
@j-xiong
Copy link
Contributor Author

j-xiong commented Sep 30, 2024

PR updated. The first three commits cover the rework of the original dynamic MR proposal. Four more 2.0 API related commits are added. PR title updated to reflect the new scope.

include/rdma/fabric.h Outdated Show resolved Hide resolved
man/fi_mr.3.md Outdated Show resolved Hide resolved
man/fi_mr.3.md Outdated Show resolved Hide resolved
man/fi_mr.3.md Outdated Show resolved Hide resolved
man/fi_getinfo.3.md Outdated Show resolved Hide resolved
include/rdma/fi_collective.h Outdated Show resolved Hide resolved
@j-xiong
Copy link
Contributor Author

j-xiong commented Oct 1, 2024

PR updated to address comments.

man/fi_mr.3.md Outdated Show resolved Hide resolved
man/fi_mr.3.md Outdated Show resolved Hide resolved
man/fi_mr.3.md Outdated Show resolved Hide resolved
man/fi_mr.3.md Outdated Show resolved Hide resolved
man/fi_mr.3.md Show resolved Hide resolved
include/rdma/fabric.h Outdated Show resolved Hide resolved
@j-xiong
Copy link
Contributor Author

j-xiong commented Oct 8, 2024

@shefty Do you see any more changes needed?

man/fi_mr.3.md Outdated
## base_mr

If non-NULL, the base_mr field is used to specify the MR from which a
sub-MR is to be created.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You describe a sub-mr in the commit message, but it's not defined in the man pages. So a user doesn't understand what this is. From the viewpoint of the API, the user only knows of an MR. Note that it's possible to derive a new MR from a MR that itself is derived from another MR.

I would rework the text.

If non-NULL, creates a new memory region that is associated with a previously created region.
The new region must be fully contained within the existing region; however, new region has
its own access rights.  The following attributes are inherited by the new region, and as a result,
are ignored when creating the new region:

auth_key_size, auth_key, iface, device, hmem_data, page_size, reserved_key_count
(what attributes are actually inherited, versus which may be specified?).

Associated memory regions may be used to shared hardware resources, such as virtual to physical
address translations and page pinning.  This can improve performance when creating and destroying
sub-regions that need different access rights.

Use of this field requires API version X.Y.

blah blah

man/fi_mr.3.md Outdated

The base_mr field must be NULL if the FI_MR_DMABUF flag is set.

## reserved_key_count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename. From the viewpoint of the application, this is the number of associated memory regions. The API also uses 'cnt' instead of 'count'.

sub_mr_cnt ?
assoc_mr_cnt ?
bound_mr_cnt ?
linked_mr_cnt ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly. One could reserve less keys and when used out, revert back to regular key allocation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is about the name. The application is creating an associated MR (not a key). What, if anything, the provider pre-allocates isn't defined. It could pre-allocate the fid_mr's. I excluded using the term 'max' as part of the name because it may or may not be a max.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, now I see your point. Since application does not need to know the pre-allocation details, it makes sense to make it a hint for intended usage instead.

man/fi_mr.3.md Outdated
memory region is automatically invalidated and can no longer be used for
remote access. This flag is intended to be used when registering a
sub-MR to get a single use key of the base MR. However, it can be used
with any memory region.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop text referring to sub-mr. It applies to any MR, no need to call out an intent.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Currently FI_MULTI_RECV is effectively only defined for untagged message
only. Simply expanding the definition to tagged message would cause
difficulties in either provider support or discovery.

Define FI_TAGGED_MULTI_RECV to indicate that multi recv is supported in
tagged message as well. This is only used as a capability bit. The op
flag and cq flag continues to use FI_MULTI_RECV.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
FI_DIRECTED_RECV covers both untagged and tagged message. However, the most
often used case is for tagged message. Having a saparate bit for tagged
message allows the provider to optimize non-tagged messsage implementation
while maintain support directed recv over tagged message.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
The new bit FI_EXACT_DIRECTED_RECV is similar to FI_DIRECTED_RECV, but
requires exact source address. I.e., the wildcard address FI_ADDR_UNSPEC
is not allowed.

It can be used alone, or be used together with FI_DIRECTED_RECV or
FI_TAGGED_DIRECTED_RECV as a modifier.

Not allowing wildcard source address allows the provider to better
optmize the receive handling.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Memory registration consists of two parts: map/pin the memory for local
access and export with a key for remote access. The first part is usually
heavyweight and requries kernel involvement. The second part is less
expensive and can be further separated into key allocation and key
assignment. Key allocation may needs kernel involvement, but key assignment
can be done in user space. Here sub-MR is introduced as a way to allow
separattion of the forementioned two parts, and key reservation is added
to further optimize sub-MR creation.

A sub-MR is created from an existing MR (the base MR).  It inherits the
memory mapping/pinning of the base MR but has its own access key.  The
address range exposed can be same as the the base MR or a subpart of that.
The access rights can be different, too.

Now the base MR can be created with a few extra keys reserved. These reserved
keys will be automatically used for sub-MR registration. This only applies
to FI_MR_PROV_KEY mode.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
@j-xiong
Copy link
Contributor Author

j-xiong commented Oct 10, 2024

@shefty PR updated based on your feedback w/ some tweak. I added definition of sub-MR and bass MR to the beginning and kept the use of such terminology later.

man/fi_msg.3.md Outdated
@@ -173,6 +173,11 @@ to write CQ entries for all successful completions. See the flags
discussion below for more details. The requested message size that
can be used with fi_inject is limited by inject_size.

If FI_HMEM is enabled, the fi_inject call can only accept buffer with
iface equal to FI_HMEM_SYSTEM if the provider requires FI_MR_VIRT_ADDR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean FI_MR_HMEM instead of FI_MR_VIRT_ADDR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Blame the co-pilot ...

Add text to clarify that only FI_HMEM_SYSTEM is allowed for inject calls
if FI_MR_HMEM is required.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants