Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT/ROCM: bring ROCm fixes over to v1.13 branch #8235

Merged
merged 3 commits into from
May 18, 2022

Conversation

edgargabriel
Copy link
Contributor

What

This pr brings three fixes from the master repository over to the v1.13 branch.
The three Prs are:

UCT/ROCM: increase max. number of hsa agents
(cherry picked from commit 7baac93)

UCT/ROCM: fix memory type detection
(cherry picked from commit 4a876b5)

UCT/ROCM/IPC: use remote_agent if available
(cherry picked from commit 79ea940)

Why ?

All three commits are required for correct execution of rocm/ucx jobs.

How ?

also minor reorganization of the uct_rocm_base_agents structure
to reduce the number of cache misses.

(cherry picked from commit 7baac93)
fix the approach used to identify ROCm memory type. ROCm memory type
is as of right now of type HSA_EXT_POINTER_TYPE_HSA with the owner agent being a GPU.

(cherry picked from commit 4a876b5)
use the correct remote_agent handle if information is available. Using the local_agent for both src and destination is just the backup plan in case the remote_agent info is not available, e.g. in case the user limited the visibility of devices for a/some/all processes.

(cherry picked from commit 79ea940)
@yosefe yosefe merged commit 5879c44 into openucx:v1.13.x May 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants