More API updates for 2.0 #10317

j-xiong · 2024-08-19T16:15:41Z

Reworked the dynamic MR commit and expand to include more pending 2.0 API changes:

core: Move flags only used for memory registration calls to fi_domain.h
core: Define flag for single use MR
core: Define capability bit for tagged multi receive
core: Define capability for tagged message only directed recv
core: Define capbility for directed receive without wildcard src_addr
core: Introduce Sub-MR
man: Clarify FI_HMEM support of inject calls

·

shefty

Stopping review to move discussion to PR

man/fi_mr.3.md

include/rdma/fabric.h

include/rdma/fi_domain.h

shefty · 2024-08-20T22:05:52Z

This change is kind of like memory windows, but not. It requires the provider support 64-bit keys (or smaller). And it's actually adding 2 concepts together.

FI_MR_SINGLE_USE is really independent and could apply to existing registrations. There's no way for a provider to indicate that it supports single use regions, outside of just trying the flag in a call and seeing if it works. I don't know if this should be a domain capability, but maybe...

The dynamic keys are basically trying to create a new MR object that references the same pinned pages. That's equivalent to a memory window. However, MWs are created by posting to a QP/EP, whereas, this is a MR/domain operation. In either case, the user should be given a fid_mr structure here, not a u64 key. That allows integration with the other fi_mr calls (map, unmap, bind, refresh, enable).

This is probably doable by adding fid_mr to the fi_mr_attr, uhm, somewhere. The user can just create/destroy the extra MRs through the existing calls, with the same capabilities/restrictions that the provider has for other MRs (user or provider selected keys).

I don't know if we would actually need a new capability in the latter case. A provider could always perform a second registration, in which case, saving on the page pinning is simply an optimization.

j-xiong · 2024-08-20T22:53:41Z

@shefty Thanks for the feedback. Yes, single use and dynamic key assignment did come as two separate issues but I combined them here with the assumption that single use could be simpler with the key instead of the entire MR. I agree that this mostly equivalent to memory windows. One consideration about the bulk key allocations is that it may have the advantage of avoiding kernel involvement when the key is assigned (in case the provider doesn't allow user selected keys). But how much can be gained from that is unclear. I am going to rework the patch, maybe along the line of getting closer to a MW-like approach.

shefty · 2024-08-21T00:32:06Z

Bulk key allocation could be defined as an attribute when the original MR is created. I agree that a single use MR is less useful than a single use window/key.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>

j-xiong · 2024-09-30T05:30:14Z

PR updated. The first three commits cover the rework of the original dynamic MR proposal. Four more 2.0 API related commits are added. PR title updated to reflect the new scope.

include/rdma/fabric.h

man/fi_mr.3.md

man/fi_getinfo.3.md

include/rdma/fi_collective.h

j-xiong · 2024-10-01T22:15:55Z

PR updated to address comments.

man/fi_mr.3.md

include/rdma/fabric.h

j-xiong · 2024-10-08T18:23:26Z

@shefty Do you see any more changes needed?

shefty · 2024-10-08T18:47:27Z

man/fi_mr.3.md

+## base_mr
+
+If non-NULL, the base_mr field is used to specify the MR from which a
+sub-MR is to be created.


You describe a sub-mr in the commit message, but it's not defined in the man pages. So a user doesn't understand what this is. From the viewpoint of the API, the user only knows of an MR. Note that it's possible to derive a new MR from a MR that itself is derived from another MR.

I would rework the text.

If non-NULL, creates a new memory region that is associated with a previously created region. The new region must be fully contained within the existing region; however, new region has its own access rights. The following attributes are inherited by the new region, and as a result, are ignored when creating the new region: auth_key_size, auth_key, iface, device, hmem_data, page_size, reserved_key_count (what attributes are actually inherited, versus which may be specified?). Associated memory regions may be used to shared hardware resources, such as virtual to physical address translations and page pinning. This can improve performance when creating and destroying sub-regions that need different access rights. Use of this field requires API version X.Y. blah blah

shefty · 2024-10-08T19:01:26Z

man/fi_mr.3.md

+
+The base_mr field must be NULL if the FI_MR_DMABUF flag is set.
+
+## reserved_key_count


I would rename. From the viewpoint of the application, this is the number of associated memory regions. The API also uses 'cnt' instead of 'count'.

sub_mr_cnt ? assoc_mr_cnt ? bound_mr_cnt ? linked_mr_cnt ?

Not exactly. One could reserve less keys and when used out, revert back to regular key allocation.

The comment is about the name. The application is creating an associated MR (not a key). What, if anything, the provider pre-allocates isn't defined. It could pre-allocate the fid_mr's. I excluded using the term 'max' as part of the name because it may or may not be a max.

Thanks, now I see your point. Since application does not need to know the pre-allocation details, it makes sense to make it a hint for intended usage instead.

shefty · 2024-10-08T19:02:47Z

man/fi_mr.3.md

+  memory region is automatically invalidated and can no longer be used for
+  remote access.  This flag is intended to be used when registering a
+  sub-MR to get a single use key of the base MR.  However, it can be used
+  with any memory region.


Drop text referring to sub-mr. It applies to any MR, no need to call out an intent.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>

Currently FI_MULTI_RECV is effectively only defined for untagged message only. Simply expanding the definition to tagged message would cause difficulties in either provider support or discovery. Define FI_TAGGED_MULTI_RECV to indicate that multi recv is supported in tagged message as well. This is only used as a capability bit. The op flag and cq flag continues to use FI_MULTI_RECV. Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>

FI_DIRECTED_RECV covers both untagged and tagged message. However, the most often used case is for tagged message. Having a saparate bit for tagged message allows the provider to optimize non-tagged messsage implementation while maintain support directed recv over tagged message. Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>

The new bit FI_EXACT_DIRECTED_RECV is similar to FI_DIRECTED_RECV, but requires exact source address. I.e., the wildcard address FI_ADDR_UNSPEC is not allowed. It can be used alone, or be used together with FI_DIRECTED_RECV or FI_TAGGED_DIRECTED_RECV as a modifier. Not allowing wildcard source address allows the provider to better optmize the receive handling. Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>

Memory registration consists of two parts: map/pin the memory for local access and export with a key for remote access. The first part is usually heavyweight and requries kernel involvement. The second part is less expensive and can be further separated into key allocation and key assignment. Key allocation may needs kernel involvement, but key assignment can be done in user space. Here sub-MR is introduced as a way to allow separattion of the forementioned two parts, and key reservation is added to further optimize sub-MR creation. A sub-MR is created from an existing MR (the base MR). It inherits the memory mapping/pinning of the base MR but has its own access key. The address range exposed can be same as the the base MR or a subpart of that. The access rights can be different, too. Now the base MR can be created with a few extra keys reserved. These reserved keys will be automatically used for sub-MR registration. This only applies to FI_MR_PROV_KEY mode. Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>

j-xiong · 2024-10-10T21:35:52Z

@shefty PR updated based on your feedback w/ some tweak. I added definition of sub-MR and bass MR to the beginning and kept the use of such terminology later.

shefty · 2024-10-10T21:55:42Z

man/fi_msg.3.md

@@ -173,6 +173,11 @@ to write CQ entries for all successful completions.  See the flags
 discussion below for more details. The requested message size that
 can be used with fi_inject is limited by inject_size.

+If FI_HMEM is enabled, the fi_inject call can only accept buffer with
+iface equal to FI_HMEM_SYSTEM if the provider requires FI_MR_VIRT_ADDR


Did you mean FI_MR_HMEM instead of FI_MR_VIRT_ADDR?

Yes. Blame the co-pilot ...

Add text to clarify that only FI_HMEM_SYSTEM is allowed for inject calls if FI_MR_HMEM is required. Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>

j-xiong requested review from shefty, aingerson, shijin-aws and iziemba August 20, 2024 21:00

shefty requested changes Aug 20, 2024

View reviewed changes

man/fi_mr.3.md Outdated Show resolved Hide resolved

include/rdma/fabric.h Outdated Show resolved Hide resolved

include/rdma/fi_domain.h Outdated Show resolved Hide resolved

j-xiong added the work in progress label Aug 20, 2024

j-xiong force-pushed the dynamic-mr branch from 215e321 to 3700a5b Compare September 30, 2024 05:18

core: Move flags only used for memory registration calls to fi_domain.h

f9f492f

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>

j-xiong force-pushed the dynamic-mr branch from 3700a5b to 6c08a44 Compare September 30, 2024 05:21

j-xiong changed the title ~~core: Introduce MR with dynamic keys~~ API updated for 2.0 Sep 30, 2024

j-xiong changed the title ~~API updated for 2.0~~ More API updates for 2.0 Sep 30, 2024

j-xiong removed the work in progress label Sep 30, 2024

shefty reviewed Sep 30, 2024

View reviewed changes

j-xiong force-pushed the dynamic-mr branch from 6c08a44 to 14cdbac Compare October 1, 2024 22:11

aingerson reviewed Oct 2, 2024

View reviewed changes

j-xiong force-pushed the dynamic-mr branch from 14cdbac to 9dd541a Compare October 3, 2024 01:48

shefty reviewed Oct 8, 2024

View reviewed changes

j-xiong force-pushed the dynamic-mr branch from 9dd541a to 9a6d5b2 Compare October 10, 2024 21:28

j-xiong added 2 commits October 10, 2024 14:32

core: Define flag for single use MR

378fb6d

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>

j-xiong added 3 commits October 10, 2024 14:32

j-xiong force-pushed the dynamic-mr branch from 9a6d5b2 to 84396a4 Compare October 10, 2024 21:33

shefty reviewed Oct 10, 2024

View reviewed changes

man: Clarify FI_HMEM support of inject calls

14269c8

Add text to clarify that only FI_HMEM_SYSTEM is allowed for inject calls if FI_MR_HMEM is required. Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>

j-xiong force-pushed the dynamic-mr branch from 84396a4 to 14269c8 Compare October 10, 2024 22:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More API updates for 2.0 #10317

More API updates for 2.0 #10317

j-xiong commented Aug 19, 2024 •

edited

Loading

shefty left a comment

shefty commented Aug 20, 2024 •

edited

Loading

j-xiong commented Aug 20, 2024

shefty commented Aug 21, 2024

j-xiong commented Sep 30, 2024

j-xiong commented Oct 1, 2024

j-xiong commented Oct 8, 2024

shefty Oct 8, 2024

shefty Oct 8, 2024

j-xiong Oct 8, 2024

shefty Oct 8, 2024

j-xiong Oct 8, 2024

shefty Oct 8, 2024

j-xiong commented Oct 10, 2024

shefty Oct 10, 2024

j-xiong Oct 10, 2024


		The base_mr field must be NULL if the FI_MR_DMABUF flag is set.

		## reserved_key_count

More API updates for 2.0 #10317

Are you sure you want to change the base?

More API updates for 2.0 #10317

Conversation

j-xiong commented Aug 19, 2024 • edited Loading

shefty left a comment

Choose a reason for hiding this comment

shefty commented Aug 20, 2024 • edited Loading

j-xiong commented Aug 20, 2024

shefty commented Aug 21, 2024

j-xiong commented Sep 30, 2024

j-xiong commented Oct 1, 2024

j-xiong commented Oct 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

j-xiong commented Oct 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

j-xiong commented Aug 19, 2024 •

edited

Loading

shefty commented Aug 20, 2024 •

edited

Loading