Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enables MG python tests using a single-GPU LocalCUDACluster in CI #3596

Conversation

rlratzel
Copy link
Contributor

@rlratzel rlratzel commented May 22, 2023

closes #3413

This PR enables MG python tests using a single-GPU LocalCUDACluster in CI by setting the DASK_WORKER_DEVICES to 0. This does not affect SG tests, which continue to run in CI as they previously did.

PR #3540 added support for DASK_WORKER_DEVICES to the pytest fixture used in python MG tests, allowing CI scripts (and developers) to restrict workers to specific devices, which should now allow single-GPU CI runs to cover the MG/dask code paths in python.

Note: despite the changes here to now run python MG tests, it should be noted that this isn't actual multi-GPU test coverage in libcugraph since multiple GPUs are not communicating and the code to set up comms, distribute graph data, etc. is not being exercised using >1 GPU.

@rlratzel rlratzel added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 22, 2023
@rlratzel rlratzel requested a review from a team as a code owner May 22, 2023 05:05
@rlratzel rlratzel self-assigned this May 22, 2023
@rlratzel rlratzel requested a review from VibhuJawa May 22, 2023 05:05
@BradReesWork BradReesWork added this to the 23.06 milestone May 24, 2023
Copy link
Member

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving but we should wait before merging to see amount of time added , CC: @rlratzel

@VibhuJawa VibhuJawa added DO NOT MERGE Hold off on merging; see PR for details and removed DO NOT MERGE Hold off on merging; see PR for details labels May 24, 2023
… set up, adds temporary skips to MG tests that do not/cannot run in SG environments.
@rlratzel rlratzel requested a review from a team as a code owner May 24, 2023 19:45
@rlratzel
Copy link
Contributor Author

I ran the MG tests using DASK_WORKER_DEVICES=0 locally and observed they took ~20 minutes, meaning our CI runs will be at least 20 minutes longer now assuming the CI system is equal to or slower than my system.

This isn't ideal but it's a decent tradeoff for getting the additional coverage.

A few notes:

  • This PR does not remove any of the conditional skip markers that existed that check for >1 GPU, so there's still some missing coverage. But even with those in place, we're still adding significant coverage. Those will be removed in a future PR.
  • We still have a lot of testing tech debt to clean up, so making the behavior of using the mg marker be "uses >1 GPU" and sg be "uses only 1 GPU" without setting DASK_WORKER_DEVICES will be done as part of closing Python tests should have a shared set of fixtures #1347

@BradReesWork
Copy link
Member

/merge

@rapids-bot rapids-bot bot merged commit aa00704 into rapidsai:branch-23.06 May 27, 2023
@rlratzel rlratzel deleted the branch-23.06-sg_localcudacluster_ci_testing branch September 28, 2023 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Run MG tests in CI using a single-GPU LocalCUDACluster
4 participants