Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Wrong cuda version is derived, causing "No module named 'pynvjitlink'" #6018

Open
maxiuw opened this issue Aug 8, 2024 · 5 comments
Open
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@maxiuw
Copy link

maxiuw commented Aug 8, 2024

Describe the bug
A clear and concise description of what the bug is.

Even though I installed everything with cu11 when I try to import cuml I am getting an error:

line 137, in _setup_numba from pynvjitlink.patch import patch_numba_linker ModuleNotFoundError: No module named 'pynvjitlink'

This is caused because of cudf/utils/_numba.py, line 137 where driver/cuda version is checked and I am not sure why but it indicates to 12. Solution is ofc to install the package but it does not have cu11 implementation. I just commented out line 136-139 but it is not good solution for non-local deployment.

Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.

import cuml

error in cudf/utils/_numba.py, line 137, in _setup_numba from pynvjitlink.patch import patch_numba_linker ModuleNotFoundError: No module named 'pynvjitlink'

Expected behavior
A clear and concise description of what you expected to happen.


>>> pip list | grep 'cu'   

cubinlinker-cu11          0.3.0.post2
cuda-python               11.8.3
cudf-cu11                 24.6.1
cuml-cu11                 24.6.1
cupy-cuda11x              13.2.0
dask-cuda                 24.6.0
dask-cudf-cu11            24.6.1
distributed-ucxx-cu11     0.38.0
executing                 2.0.1
libucx-cu11               1.15.0.post1
ptxcompiler-cu11          0.8.1.post1
pylibraft-cu11            24.6.0
raft-dask-cu11            24.6.0
rmm-cu11                  24.6.0
torch                     2.1.0+cu118
torchaudio                2.1.0+cu118
torchvision               0.16.0+cu118
ucx-py-cu11               0.38.0
ucxx-cu11                 0.38.0
>>> nvcc -V   
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
>>> from cudf.utils._ptxcompiler import NO_DRIVER, safe_get_versions
>>> safe_get_versions()
((12, 0), (12, 1))
@maxiuw maxiuw added ? - Needs Triage Need team to review and classify bug Something isn't working labels Aug 8, 2024
@dantegd
Copy link
Member

dantegd commented Aug 9, 2024

This error is coming from the cuDF side of things, maybe @galipremsagar @vyasr or @divyegala might be able to give some insight here

@galipremsagar
Copy link
Contributor

cc: @brandon-b-miller curious if you know why this might be happening.

@brandon-b-miller
Copy link
Contributor

pynvjitlink is indeed a cuda 12 specific requirement. However cuDF shouldn't attempt to find it unless it detects that it is in a cuda 12 environment. The question is why you are getting ((12, 0), (12, 1) from safe_get_versions in what is apparently a cuda 11 environment. Ultimately the information is obtained from cuDriverGetVersion, so it's finding cuda 12 somewhere.

@maxiuw can you provide some details on how you constructed the environment and installed things?

@maxiuw
Copy link
Author

maxiuw commented Aug 12, 2024

Sure, what information you need? I use conda env and I install everything inside it. Do you need a list of packages?

@brandon-b-miller
Copy link
Contributor

Thanks @maxiuw . Starting from the base environment (no conda env yet) can you provide the output of nvidia-smi? Then, would you be able to share the steps you used to create the environment you are using that contains cuML? For instance did you install using command line instructions from https://docs.rapids.ai/install or are you creating your conda environment by some other means?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants