-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: Can not train llama-7b-hf due to “Tokenizer class LLaMATokenizer does not exist or is not currently imported.” on 3090(24GB) #3372
Comments
从源码构建的方式安装transformer |
I have installed this, and I thought it was the problem at first, and I installed it repeatedly. But the problem remains the same. |
这个我安装过了,而且我一开始也觉得是这个问题了,还重复安装过。但是问题还是一样。 |
找到原因了 huggingface/transformers#22222 Hi @candowu, thanks for raising this issue. This is arising, because the tokenizer in the config on the hub points to LLaMATokenizer. However, the tokenizer in the library is LlamaTokenizer. This is likely due to the configuration files being created before the final PR was merged in. Change the LLaMATokenizer in tokenizer_config.json into lowercase LlamaTokenizer and it works like a charm. |
Hello, do you use a single GPU for training, and how much video memory is it? |
The same problem,it solved. |
🐛 Describe the bug
train_sft.sh file content:
···python
torchrun --standalone --nproc_per_node=1 train_sft.py
--pretrain "/hy-tmp/ai/colossal-ai-chat/models/llama-7b-hf/"
--model 'llama'
--strategy colossalai_zero2
--log_interval 10
--save_path /hy-tmp/ai/colossal-ai-chat/train/models/coati-llama-7b-hf
--dataset /hy-tmp/ai/colossal-ai-chat/dataset/instinwild_cn.json
--batch_size 1
--accimulation_steps 8
--lr 2e-5
--max_datasets_size 512
--max_epochs 1 \
Environment
torch==1.12.1+cu113
torchvision==0.13.1+cu113
torchaudio==0.12.1
Python==3.8.16
os=Ubuntu 20.04.4 LTS
The text was updated successfully, but these errors were encountered: