-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCT/CUDA: Return unreachable from rkey unpack in case of error - v1.16.x #9717
Conversation
Do we need the fix in master branch? |
yes, le'ts wait for @yosefe approve |
src/uct/cuda/cuda_ipc/cuda_ipc_md.c
Outdated
UCT_CUDA_IPC_GET_DEVICE(this_device); | ||
UCT_CUDA_IPC_DEVICE_GET_COUNT(num_devices); | ||
if ((CUDA_SUCCESS != cuCtxGetDevice(&this_device)) || | ||
(CUDA_SUCCESS != cuDeviceGetCount(&num_devices))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can keep the error message if num_devices query fails, since it should not happen at this point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in place
03e8955
to
11511cb
Compare
11511cb
to
9cdac44
Compare
What
Always return unreachable from
uct_cuda_ipc_rkey_unpack
to ignore cuda IPC key when it can't be used (e.g. cuda device is not set for this process)