UCT/GTEST: Fixed multiple tests for gdr_copy transport - v1.17.x #9853
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
This is a double commit of #9840
This is the fix for RM#3873368.
The issue is always reproducible on rock, when building and running UCX with the following modules:
With
gdr_copy
MD configured, memory registration fails in several test suites:The root cause was always the same: CUDA memory of arbitrary size was allocated, and then this memory is registered with
uct_md_mem_reg
without any alignment. However this does not work for gdr_copy transport, because it's required to register memory aligned by GPU_PAGE_SIZE (64k in this case).I fixed the memory registration in all those places the same way it's done in ucp_mm module: by alignment using
ucs_align_ptr_range
API