Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce CI jobs. #236

Merged
merged 3 commits into from
Aug 23, 2024
Merged

Reduce CI jobs. #236

merged 3 commits into from
Aug 23, 2024

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Aug 21, 2024

We want to reduce CI times, which means we need to reduce the job queue.

Currently we are limited by the number of available runners for linux-amd64-gpu-v100-latest during working/daytime hours (Pacific) and limited by linux-arm64-gpu-a100-latest during nightlies (which are daytime working hours for EU/APAC teams).

I also added DEPENDENCIES: ['oldest', 'latest'] to C++ test jobs. While we don't have a specific use case for that yet, it is good to keep our matrix defined the same way for all test jobs (conda C++ and conda Python and wheels).

Related to rapidsai/build-planning#95.

@bdice bdice requested a review from a team as a code owner August 21, 2024 22:57
@bdice bdice requested a review from raydouglass August 21, 2024 22:57
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes: Look at the conda-python-tests.yaml file for a cleaner diff. This one has the same matrix changes but also added DEPENDENCIES.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes:

  • I dropped the ARM nightly job for CUDA 11.4. This should be adequately covered by amd64 tests with CUDA 11.4.
  • Added DEPENDENCIES to match Python matrices
  • I dropped one amd64 nightly job for CUDA 11.4 since we had two. That felt unnecessary.
  • I added a test for ARM with the latest CUDA version. I think we need some test of CUDA-latest on both ARM and amd64. However this means there are no CUDA 12.2 jobs... would we rather change the CUDA 12.0 to 12.2?

Copy link
Member

@jakirkham jakirkham Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this all seems reasonable. Would expect ARM users to more often be on newer hardware and thus have newer dependencies (especially CTK). So this seems like the right direction to shift resources

Edit: Personally I would lean towards on ARM on CUDA 12.2 and x86_64 on CUDA 12.0. Some things (like libcufile) weren't available on ARM until CUDA 12.2 and would like for us to test those

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would lean towards on ARM on CUDA 12.2 and x86_64 on CUDA 12.0. Some things (like libcufile) weren't available on ARM until CUDA 12.2 and would like for us to test those

Seconded

Copy link
Contributor Author

@bdice bdice Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this to use more CUDA 12.2 in ARM nightlies. For some PR tests I left ARM on CUDA 12.0 where we only have three PR jobs (two x86 and one ARM). I want to cover (11.8, 12.0, 12.5), which is our usual testing strategy of prioritizing CUDA (last major latest minor, current major earliest minor, current major latest minor).

Copy link
Contributor Author

@bdice bdice Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes:

  • Previously we had no tests of wheels on Rocky Linux 8. That is important to cover because our wheels are specifically built for glibc 2.28 as a lower bound, which corresponds to Rocky Linux 8.
  • I cut the nightly amd64 jobs quite a bit. It seems like 6 jobs is overkill.
  • I deleted a nightly ARM job for CUDA 11.8 because there were two.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding wheels that was likely an oversight when moving from CentOS 7 to Rocky Linux 8. Agree we should cover them on images where the GLIBC allows

Generally agree with the cuts. Though would shift things a bit amongst those jobs. Will comment below

.github/workflows/wheels-test.yaml Outdated Show resolved Hide resolved
.github/workflows/wheels-test.yaml Show resolved Hide resolved
.github/workflows/wheels-test.yaml Outdated Show resolved Hide resolved
.github/workflows/wheels-test.yaml Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding wheels that was likely an oversight when moving from CentOS 7 to Rocky Linux 8. Agree we should cover them on images where the GLIBC allows

Generally agree with the cuts. Though would shift things a bit amongst those jobs. Will comment below

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Bradley! 🙏

Would change some Python 3.10 jobs to 3.11. Soon we will be adding Python 3.12 and need to add those somewhere. So would rather have Python 3.11 in a few more spots to make it easy to swap some of those for Python 3.12

.github/workflows/conda-python-tests.yaml Outdated Show resolved Hide resolved
Co-authored-by: jakirkham <jakirkham@gmail.com>
Copy link
Contributor Author

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use more CUDA 12.2 jobs on ARM.

.github/workflows/conda-cpp-tests.yaml Outdated Show resolved Hide resolved
.github/workflows/conda-python-tests.yaml Outdated Show resolved Hide resolved
.github/workflows/wheels-test.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the changes. Don't have super strong feelings about the exact variable combinations in each job, so if everyone else's thoughts have been addressed this looks good to me

@bdice
Copy link
Contributor Author

bdice commented Aug 23, 2024

I'm going to run a test PR on cudf with this branch, then merge if that looks good.

Test PR here: rapidsai/cudf#16646

@bdice
Copy link
Contributor Author

bdice commented Aug 23, 2024

Things look fine downstream! Merging.

@bdice bdice merged commit 157e982 into branch-24.10 Aug 23, 2024
@bdice bdice mentioned this pull request Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants