-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
smcuda: fixes when using enable-mca-dso #11443
Conversation
7b6c570
to
6ad92c2
Compare
rebasing on top of main to pull in pmix_fence usage patch |
@@ -79,6 +83,8 @@ int mca_btl_smcuda_accelerator_init(void) | |||
goto cleanup_and_error; | |||
} | |||
|
|||
opal_finalize_register_cleanup(mca_btl_smcuda_accelerator_fini); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you when you say this will fix the issue, but do you mind explaining why it needs to happen this way since I don't quite understand. The pml ob1 component does something similar and I'm wondering if it requires the same fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wckzhang does this comment block help answer your questions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we need to do the same thing with ob1 then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't in theory this the case for all components that use the accelerator framework functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wckzhang where would you want to make a change in the ob1 to use the opal cleanup callback mechanism?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgargabriel if you find your consumer of the accelerator framework is doing something in its component close method that involves calls into the accelerator framework then you are probably right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it back about ob1 because it already does it in a fini function instead of component close
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgargabriel Yes, though the other components aren't doing any accelerator functionality during finalization. The reason ob1 and smcuda are susceptible to this is because they create event arrays for the purpose of their asynchronous progress engine which need to be cleaned up during finalization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about the rcache/grdma component? I think that one does something similar, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No as far as I can tell it doesn't access the accelerator framework in its closing sequence, just during its register path.
np i'll put some verbiage in around this call. |
related to open-mpi#11354 Signed-off-by: Howard Pritchard <howardp@lanl.gov>
6ad92c2
to
f7803dd
Compare
related to #11354