[BUG]: Can not train llama-7b-hf due to “Tokenizer class LLaMATokenizer does not exist or is not currently imported.” on 3090(24GB) #3372

x22x22 · 2023-03-31T07:09:50Z

🐛 Describe the bug

train_sft.sh file content:
···python
torchrun --standalone --nproc_per_node=1 train_sft.py
--pretrain "/hy-tmp/ai/colossal-ai-chat/models/llama-7b-hf/"
--model 'llama'
--strategy colossalai_zero2
--log_interval 10
--save_path /hy-tmp/ai/colossal-ai-chat/train/models/coati-llama-7b-hf
--dataset /hy-tmp/ai/colossal-ai-chat/dataset/instinwild_cn.json
--batch_size 1
--accimulation_steps 8
--lr 2e-5
--max_datasets_size 512
--max_epochs 1 \


run ./train_sft.sh, The error information is as follows:
```bash
device: 0
device = _get_device_index(device): 0
[03/31/23 15:04:16] INFO     colossalai - colossalai - INFO: /hy-tmp/conda/colossal-chat/lib/python3.8/site-packages/colossalai/context/parallel_context.py:522 set_device                      
                    INFO     colossalai - colossalai - INFO: process rank 0 is bound to device 0                                                                                                
[03/31/23 15:04:18] INFO     colossalai - colossalai - INFO: /hy-tmp/conda/colossal-chat/lib/python3.8/site-packages/colossalai/context/parallel_context.py:558 set_seed                        
                    INFO     colossalai - colossalai - INFO: initialized seed on rank 0, numpy: 42, python random: 42, ParallelMode.DATA: 42, ParallelMode.TENSOR: 42,the default parallel seed 
                             is ParallelMode.DATA.                                                                                                                                              
                    INFO     colossalai - colossalai - INFO: /hy-tmp/conda/colossal-chat/lib/python3.8/site-packages/colossalai/initialize.py:116 launch                                        
                    INFO     colossalai - colossalai - INFO: Distributed environment is initialized, data parallel size: 1, pipeline parallel size: 1, tensor parallel size: 1                  
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:25<00:00,  1.30it/s]Traceback (most recent call last):
  File "train_sft.py", line 184, in <module>
    train(args)
  File "train_sft.py", line 65, in train
    tokenizer = AutoTokenizer.from_pretrained(
  File "/hy-tmp/conda/colossal-chat/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 678, in from_pretrained
    raise ValueError(
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 32609) of binary: /hy-tmp/conda/colossal-chat/bin/python
Traceback (most recent call last):
  File "/hy-tmp/conda/colossal-chat/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/hy-tmp/conda/colossal-chat/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/hy-tmp/conda/colossal-chat/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main
    run(args)
  File "/hy-tmp/conda/colossal-chat/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
    elastic_launch(
  File "/hy-tmp/conda/colossal-chat/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/hy-tmp/conda/colossal-chat/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
train_sft.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-03-31_15:06:14
  host      : I11b07783a900101bd6
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 32609)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Environment

torch==1.12.1+cu113
torchvision==0.13.1+cu113
torchaudio==0.12.1
Python==3.8.16
os=Ubuntu 20.04.4 LTS

The text was updated successfully, but these errors were encountered:

cauyxy · 2023-03-31T08:05:38Z

从源码构建的方式安装transformer
pip install git+https://github.com/huggingface/transformers.git

Issues-translate-bot · 2023-03-31T08:30:46Z

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

I have installed this, and I thought it was the problem at first, and I installed it repeatedly. But the problem remains the same.

x22x22 · 2023-03-31T08:31:06Z

从源码构建的方式安装transformer pip install git+https://github.com/huggingface/transformers.git

这个我安装过了，而且我一开始也觉得是这个问题了，还重复安装过。但是问题还是一样。

x22x22 · 2023-03-31T08:54:46Z

找到原因了

huggingface/transformers#22222

Hi @candowu, thanks for raising this issue. This is arising, because the tokenizer in the config on the hub points to LLaMATokenizer. However, the tokenizer in the library is LlamaTokenizer.

This is likely due to the configuration files being created before the final PR was merged in.

Change the LLaMATokenizer in tokenizer_config.json into lowercase LlamaTokenizer and it works like a charm.

Issues-translate-bot · 2023-04-04T02:24:49Z

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

Hello, do you use a single GPU for training, and how much video memory is it?

xyfigo · 2023-05-27T03:58:52Z

The same problem，it solved.

x22x22 added the bug Something isn't working label Mar 31, 2023

x22x22 closed this as completed Mar 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Can not train llama-7b-hf due to “Tokenizer class LLaMATokenizer does not exist or is not currently imported.” on 3090(24GB) #3372

[BUG]: Can not train llama-7b-hf due to “Tokenizer class LLaMATokenizer does not exist or is not currently imported.” on 3090(24GB) #3372

x22x22 commented Mar 31, 2023

cauyxy commented Mar 31, 2023

Issues-translate-bot commented Mar 31, 2023

x22x22 commented Mar 31, 2023

x22x22 commented Mar 31, 2023 •

edited

Loading

Issues-translate-bot commented Apr 4, 2023

xyfigo commented May 27, 2023

[BUG]: Can not train llama-7b-hf due to “Tokenizer class LLaMATokenizer does not exist or is not currently imported.” on 3090(24GB) #3372

[BUG]: Can not train llama-7b-hf due to “Tokenizer class LLaMATokenizer does not exist or is not currently imported.” on 3090(24GB) #3372

Comments

x22x22 commented Mar 31, 2023

🐛 Describe the bug

Environment

cauyxy commented Mar 31, 2023

Issues-translate-bot commented Mar 31, 2023

x22x22 commented Mar 31, 2023

x22x22 commented Mar 31, 2023 • edited Loading

Issues-translate-bot commented Apr 4, 2023

xyfigo commented May 27, 2023

x22x22 commented Mar 31, 2023 •

edited

Loading