Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile/environment.yaml for CUDA Version 12.3? #239

Closed
kajc10 opened this issue Feb 12, 2024 · 5 comments
Closed

Dockerfile/environment.yaml for CUDA Version 12.3? #239

kajc10 opened this issue Feb 12, 2024 · 5 comments

Comments

@kajc10
Copy link

kajc10 commented Feb 12, 2024

I have CUDA Version 12.3 and therefore the given pytorch configurations will not work.
I tried to adjust dependency versions, but could not create a working config setup. Could you help me out?

With the current environment.yaml I get stuck at 'initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1'

EDIT: solved by installing latest torch torchvision torchaudio and pillow==8.4.0

@kajc10 kajc10 closed this as completed Feb 13, 2024
@froestiago
Copy link

froestiago commented Mar 15, 2024

Hi @kajc10
Do you mind sharing your yaml or docker file for this?
I need to train the model on a custom dataset and I'm having big trouble trying to adjust dependency versions, specially regarding pytorch-lightining.
Thanks!

@Han1018
Copy link

Han1018 commented Apr 19, 2024

Hi @kajc10 , I faced same issue and how do you solve that.

@froestiago
Copy link

froestiago commented Apr 19, 2024

Hi @Han1018
I was able to make it work without any docker file, only using conda envs

After running conda env create -f environment.yaml; conda activate taming I uninstalled pytorch (torch and torchvision)
Then I installed the 1.8.1 + cu111 version pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
After that uninstalled pillow and reinstalled using pip install pillow==9.5.0.
I do not remember very well but you might come across an error regarding torch._six, if you do, replace from torch._six import string_classes with string_classes = str (reference)

Hope this helps 🤗

@Han1018
Copy link

Han1018 commented Apr 19, 2024

Hi @froestiago, Thank you sososo much. It works for me and helps me save a lot of time !!!

@senp98
Copy link

senp98 commented Apr 24, 2024

Hi @Han1018 I was able to make it work without any docker file, only using conda envs

After running conda env create -f environment.yaml; conda activate taming I uninstalled pytorch (torch and torchvision) Then I installed the 1.8.1 + cu111 version pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html After that uninstalled pillow and reinstalled using pip install pillow==9.5.0. I do not remember very well but you might come across an error regarding torch._six, if you do, replace from torch._six import string_classes with string_classes = str (reference)

Hope this helps 🤗

Somehow resolves my problem about the hanging training process. Thank you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants