-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for VA to PA changes #10062
Support for VA to PA changes #10062
Conversation
This PR is not ready to land. At this point, I am looking for feedback on the proposed changes. |
Clarify the behavior of FI_MR_HMEM being unset. Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Ready for round 2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
Previously, the collective and local data transfer operations did not allow for apps to pass in a desc unless explicit memory registration is required. This change allows apps to always pass in a desc. Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Ready for round 3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good solution for ODP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! changes look good to me
I'll have all comments address mid this week. |
One question I have is do we need to address whether a provider can support FI_MR_ALLOCATED = 0 by HMEM iface types? |
We could restrict FI_MR_ALLOCATED to memory allocated using system calls only. FI_MR_HMEM overrides FI_MR_LOCAL for HMEM. It could also override the FI_MR_ALLOCATED bit for HMEM. |
So FI_MR_HMEM = 1 and FI_MR_ALLOCATED = 0 results in VA <-> PA migration for FI_HMEM_SYSTEM only? The issue with this is that it now forces other FI_MR_HMEM behavior like explicit memory registration for local others, right? Thinking out loud... CUDA, ROCR, and ZE device drivers all support a invalidation mechanism. This opens the door for implementing ODP for even device memory. We could have a bitmask to denote which HMEM ifaces support VA <-> PA changes with FI_MR_ALLOCATED = 0? |
In general, to keep things reasonable for a user, if a provider can't support all features for all types of memory that an app can use, my vote is for the provider to fallback and disable a feature entirely. The permutations simply become untenable. I don't want apps to deal with: "I can support X with system memory. And that GPU is okay. But that GPU isn't; it's too old. And that NIC memory, I can do, but only if you touch the memory using system calls. And NVMe is okay, but only if it's off the PCI bus, not attached to CXL..." |
I agree the permutations become untenable. How about updating the FI_MR_ALLOCATED = 0 with something like the following: If FI_MR_ALLOCATED = 0 is and FI_HMEM is supported, the ability for the VA to PA mapping to change extends to to HMEM interfaces as well. If a provider cannot support VA to PA changing for a given HMEM iface, the provider should support a reasonable fallback or the operation should fail. |
I'm good with that update. |
FI_MR_ALLOCATED set documentation is updated to state that the VA to PA mapping must not change while memory is still registered. FI_MR_ALLOCATED unset documentation is added stating that the VA to PA mapping may change. This behavior can be used, for example, to support system memory to device memory page migration. Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Support fi_mr_refresh without FI_MR_ALLOCATED set. Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Page size allows applications to optionally notify the provider the page size to be used for an MR allocation. Typically, providers can select the optimal page size. In cases where a VA range has zero pages backing it, the provider may not know the optimal page size during registration. Rather than always use a less efficient page size, allow apps to specify the page size to be used. If page size is zero, provider will select the page size. If non-zero, page size must be page size support by OS. If a specific page size is specified for a memory region during creation, all pages later associated with the region must be of the given size. Attaching a memory page of a different size to a region may result in failed transfers to or from the region. Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Comments addressed and the following added: If FI_MR_ALLOCATED = 0 is and FI_HMEM is supported, the ability for the VA to PA mapping to change extends to to HMEM interfaces as well. If a provider cannot support VA to PA changing for a given HMEM iface, the provider should support a reasonable fallback or the operation should fail. |
The following series of commits updates documentation and defines new fields to enable use-cases which may result in VA to PA changes with memory being registered.