Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EKS example does not work by default #409

Closed
jameslamb opened this issue Aug 7, 2024 · 0 comments · Fixed by #458
Closed

EKS example does not work by default #409

jameslamb opened this issue Aug 7, 2024 · 0 comments · Fixed by #458
Labels
bug Something isn't working cloud/aws Amazon Web Service cloud platform/kubernetes Runs on Kubernetes

Comments

@jameslamb
Copy link
Member

Description

I believe the walk-through at https://docs.rapids.ai/deployment/stable/cloud/aws/eks/ requires some modifications.

I'll add more details tomorrow, but in short... following that example, without modification, I saw nvidia-driver-daemonset pods from the gpu-operator helm chart getting stuck in ImagePullBackOff, with an error like this:

Failed to pull image "nvcr.io/nvidia/driver:550.90.07-amzn2": rc error: code = NotFound desc = failed to pull and unpack image "nvcr.io/nvidia/driver:550.90.07-amzn2": failed to resolve reference "nvcr.io/nvidia/driver:550.90.07-amzn2": nvcr.io/nvidia/driver:550.90.07-amzn2: not found

Notes

(placeholder: will add more soon)

References

Some relevant references I consulted while debugging this

@jameslamb jameslamb added bug Something isn't working cloud/aws Amazon Web Service cloud platform/kubernetes Runs on Kubernetes labels Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cloud/aws Amazon Web Service cloud platform/kubernetes Runs on Kubernetes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant