-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCT/CUDA: create cuda streams on first use #2308
Conversation
bureddy
commented
Feb 13, 2018
Fixes #2295 |
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a unit test which could expose this bug?
&iface->outstanding_h2d_cuda_event_q, comp); | ||
|
||
/* release stream object */ | ||
ucs_mpool_put(cuda_stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we really release it immediately and not wait for a completion? so how it is different than previous situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes yossie. Before, streams objects are created in iface_init(), so even through test (MTT) is pure CPU application, all 48 ranks sharing the GPU and creating the resources and which caused the GPU out of memory. This PR creates stream only on first use which never happen for these MTT runs.
In real GPU/MPI uses cases, usually each MPI ranks exclusively uses the GPU and offloads the kernel to it. GPU context sharing with multiple ranks is not usual case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so why need an mpool? seems only one is used. can keep it on iface, but just create it on demand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think creating/destroying each time is costly operation. otherwise, we need to add a if condition check first time and create it instead of mpoo_get().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, need to add if condition. but ucs_mpool_get() also has if condition to check if the pool is not empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
choose mpool also to play with more streams in future. But, I think for now will create on demand and will go to mpool in future when need to come. will update PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
@@ -20,8 +21,8 @@ ucs_status_t _status = UCS_OK; \ | |||
do { \ | |||
cudaError_t _result = (func); \ | |||
if (cudaSuccess != _result) { \ | |||
ucs_error("[%s:%d] cuda failed with %d \n", \ | |||
__FILE__, __LINE__,_result); \ | |||
ucs_error("%s failed with %d \n", \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to rename macro from CUDA_FUNC to UCT_CUDA_CALL, and use '_func' instead of 'func'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Test PASSed. |
9e78252
to
fa627a5
Compare
fa627a5
to
0aa7fd4
Compare
Test PASSed. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
@bureddy pls port it to v1.3 |