Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for VA to PA changes #10062

Merged
merged 5 commits into from
Jun 14, 2024
Merged

Support for VA to PA changes #10062

merged 5 commits into from
Jun 14, 2024

Conversation

iziemba
Copy link
Contributor

@iziemba iziemba commented Jun 4, 2024

The following series of commits updates documentation and defines new fields to enable use-cases which may result in VA to PA changes with memory being registered.

@iziemba iziemba requested review from shefty and j-xiong June 4, 2024 16:20
@iziemba
Copy link
Contributor Author

iziemba commented Jun 4, 2024

This PR is not ready to land. At this point, I am looking for feedback on the proposed changes.

include/rdma/fi_domain.h Outdated Show resolved Hide resolved
man/fi_mr.3.md Outdated Show resolved Hide resolved
man/fi_mr.3.md Outdated Show resolved Hide resolved
man/fi_mr.3.md Outdated Show resolved Hide resolved
man/fi_mr.3.md Outdated Show resolved Hide resolved
man/fi_mr.3.md Outdated Show resolved Hide resolved
man/fi_mr.3.md Outdated Show resolved Hide resolved
man/fi_mr.3.md Show resolved Hide resolved
man/fi_mr.3.md Outdated Show resolved Hide resolved
man/fi_mr.3.md Outdated Show resolved Hide resolved
man/fi_mr.3.md Outdated Show resolved Hide resolved
man/fi_mr.3.md Outdated Show resolved Hide resolved
man/fi_mr.3.md Outdated Show resolved Hide resolved
Clarify the behavior of FI_MR_HMEM being unset.

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
@iziemba
Copy link
Contributor Author

iziemba commented Jun 4, 2024

Ready for round 2.

Copy link
Member

@shefty shefty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

man/fi_mr.3.md Show resolved Hide resolved
prov/cxi/src/cxip_mr.c Outdated Show resolved Hide resolved
Previously, the collective and local data transfer operations did not
allow for apps to pass in a desc unless explicit memory registration is
required. This change allows apps to always pass in a desc.

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
@iziemba
Copy link
Contributor Author

iziemba commented Jun 6, 2024

Ready for round 3.

man/fi_mr.3.md Outdated Show resolved Hide resolved
Copy link
Contributor

@swelch swelch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good solution for ODP.

Copy link
Member

@shefty shefty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! changes look good to me

@iziemba
Copy link
Contributor Author

iziemba commented Jun 10, 2024

I'll have all comments address mid this week.

@iziemba
Copy link
Contributor Author

iziemba commented Jun 10, 2024

One question I have is do we need to address whether a provider can support FI_MR_ALLOCATED = 0 by HMEM iface types?

@shefty
Copy link
Member

shefty commented Jun 10, 2024

We could restrict FI_MR_ALLOCATED to memory allocated using system calls only. FI_MR_HMEM overrides FI_MR_LOCAL for HMEM. It could also override the FI_MR_ALLOCATED bit for HMEM.

@iziemba
Copy link
Contributor Author

iziemba commented Jun 10, 2024

So FI_MR_HMEM = 1 and FI_MR_ALLOCATED = 0 results in VA <-> PA migration for FI_HMEM_SYSTEM only? The issue with this is that it now forces other FI_MR_HMEM behavior like explicit memory registration for local others, right?

Thinking out loud... CUDA, ROCR, and ZE device drivers all support a invalidation mechanism. This opens the door for implementing ODP for even device memory. We could have a bitmask to denote which HMEM ifaces support VA <-> PA changes with FI_MR_ALLOCATED = 0?

@shefty
Copy link
Member

shefty commented Jun 10, 2024

In general, to keep things reasonable for a user, if a provider can't support all features for all types of memory that an app can use, my vote is for the provider to fallback and disable a feature entirely. The permutations simply become untenable.

I don't want apps to deal with: "I can support X with system memory. And that GPU is okay. But that GPU isn't; it's too old. And that NIC memory, I can do, but only if you touch the memory using system calls. And NVMe is okay, but only if it's off the PCI bus, not attached to CXL..."

@iziemba
Copy link
Contributor Author

iziemba commented Jun 10, 2024

I agree the permutations become untenable.

How about updating the FI_MR_ALLOCATED = 0 with something like the following:

If FI_MR_ALLOCATED = 0 is and FI_HMEM is supported, the ability for the VA to PA mapping to change extends to to HMEM interfaces as well. If a provider cannot support VA to PA changing for a given HMEM iface, the provider should support a reasonable fallback or the operation should fail.

@shefty
Copy link
Member

shefty commented Jun 10, 2024

I'm good with that update.

FI_MR_ALLOCATED set documentation is updated to state that the VA to PA
mapping must not change while memory is still registered.

FI_MR_ALLOCATED unset documentation is added stating that the VA to PA
mapping may change. This behavior can be used, for example, to support
system memory to device memory page migration.

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Support fi_mr_refresh without FI_MR_ALLOCATED set.

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Page size allows applications to optionally notify the provider the page
size to be used for an MR allocation. Typically, providers can select
the optimal page size. In cases where a VA range has zero pages backing
it, the provider may not know the optimal page size during registration.
Rather than always use a less efficient page size, allow apps to specify
the page size to be used. If page size is zero, provider will select the
page size.

If non-zero, page size must be page size support by OS. If a specific
page size is specified for a memory region during creation, all pages
later associated with the region must be of the given size. Attaching a
memory page of a different size to a region may result in failed
transfers to or from the region.

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
@iziemba
Copy link
Contributor Author

iziemba commented Jun 12, 2024

Comments addressed and the following added:

If FI_MR_ALLOCATED = 0 is and FI_HMEM is supported, the ability for the VA to PA mapping to change extends to to HMEM interfaces as well. If a provider cannot support VA to PA changing for a given HMEM iface, the provider should support a reasonable fallback or the operation should fail.

@j-xiong j-xiong merged commit 03a2ea2 into ofiwg:main Jun 14, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants