Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT/CUDA: create cuda streams on first use #2308

Merged
merged 1 commit into from
Feb 13, 2018

Conversation

bureddy
Copy link
Contributor

@bureddy bureddy commented Feb 13, 2018

    - use mpool for cuda stream objects

@bureddy
Copy link
Contributor Author

bureddy commented Feb 13, 2018

Fixes #2295

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/4002/ for details.

Copy link
Contributor

@yosefe yosefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a unit test which could expose this bug?

&iface->outstanding_h2d_cuda_event_q, comp);

/* release stream object */
ucs_mpool_put(cuda_stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we really release it immediately and not wait for a completion? so how it is different than previous situation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes yossie. Before, streams objects are created in iface_init(), so even through test (MTT) is pure CPU application, all 48 ranks sharing the GPU and creating the resources and which caused the GPU out of memory. This PR creates stream only on first use which never happen for these MTT runs.

In real GPU/MPI uses cases, usually each MPI ranks exclusively uses the GPU and offloads the kernel to it. GPU context sharing with multiple ranks is not usual case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so why need an mpool? seems only one is used. can keep it on iface, but just create it on demand

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think creating/destroying each time is costly operation. otherwise, we need to add a if condition check first time and create it instead of mpoo_get().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, need to add if condition. but ucs_mpool_get() also has if condition to check if the pool is not empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

choose mpool also to play with more streams in future. But, I think for now will create on demand and will go to mpool in future when need to come. will update PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@@ -20,8 +21,8 @@ ucs_status_t _status = UCS_OK; \
do { \
cudaError_t _result = (func); \
if (cudaSuccess != _result) { \
ucs_error("[%s:%d] cuda failed with %d \n", \
__FILE__, __LINE__,_result); \
ucs_error("%s failed with %d \n", \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to rename macro from CUDA_FUNC to UCT_CUDA_CALL, and use '_func' instead of 'func'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/6276/ for details (Mellanox internal link).

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/4007/ for details.

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/4008/ for details.

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/6283/ for details (Mellanox internal link).

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/6284/ for details (Mellanox internal link).

@yosefe
Copy link
Contributor

yosefe commented Feb 13, 2018

@bureddy pls port it to v1.3

@yosefe yosefe merged commit 4efb7cd into openucx:master Feb 13, 2018
@bureddy bureddy deleted the delay-stream-create branch February 13, 2018 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants