Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update AWS EKS documentation to use an Ubuntu AMI #195

Open
JamesMaki opened this issue Apr 7, 2023 · 2 comments
Open

Update AWS EKS documentation to use an Ubuntu AMI #195

JamesMaki opened this issue Apr 7, 2023 · 2 comments
Labels
bug Something isn't working cloud/aws Amazon Web Service cloud platform/kubernetes Runs on Kubernetes

Comments

@JamesMaki
Copy link

JamesMaki commented Apr 7, 2023

Thank you for the phenomenal AWS documentation your team maintains, eksctl currently uses the "Amazon Linux 2 x86 Accelerated AMI" by default which has GPU driver version 470.161.03.

As of this week the NVIDIA GPU Operator officially supports EKS for Ubuntu AMIs in release 23.3.0.

Using the GPU Operator with the current default AMI results in the driver container not being deployed due to the pre-installed drivers and the device-plugin-validator fails likely due to the old GPU drivers in the cluster.

I recommend we wait to update the documentation until this issue is resolved so that we can provide a really clean way for users to create a managed Ubuntu nodegroup: eksctl-io/eksctl#6499

Once this is implemented the only change needed in RAPIDS documentation is changing the existing eksctl cluster create command to include the additional flag:

--node-ami-family Ubuntu2004 \

This should provide users with the latest recommended GPU drivers and resolve the device plugin validator pod issue.

@JamesMaki
Copy link
Author

@jacobtomlinson @beckernick sorry for the delay in opening this, I wanted to wait until the GPU Operator support was official.

@jacobtomlinson
Copy link
Member

Thanks @JamesMaki this sounds good to me. We will hold until eksctl-io/eksctl#6499 is resolved then.

@jacobtomlinson jacobtomlinson added bug Something isn't working cloud/aws Amazon Web Service cloud platform/kubernetes Runs on Kubernetes labels Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cloud/aws Amazon Web Service cloud platform/kubernetes Runs on Kubernetes
Projects
None yet
Development

No branches or pull requests

2 participants