Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added conversion script and example #1

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

robertgshaw2-neuralmagic
Copy link

@robertgshaw2-neuralmagic robertgshaw2-neuralmagic commented Jan 17, 2024

Added simple example to load GPTQ model from HF hub into Marlin format.

@efrantar

@rosario-purple
Copy link

@rib-2 Thanks for this! Unfortunately it doesn't work on my machine (8xA100), presumably because it's designed for only one GPU?

alyssavance@7e72bd4e-02:/scratch/brr$ python3 marlin/conversion/convert.py --model-id "TheBloke/Llama-2-7B-Chat-GPTQ" --save-path "./marlin-chat" --do-generation
Loading gptq model...
generation_config.json: 100%|█████████████████████████████████████████████████████| 137/137 [00:00<00:00, 987kB/s]
tokenizer_config.json: 100%|█████████████████████████████████████████████████████| 727/727 [00:00<00:00, 7.70MB/s]
tokenizer.model: 100%|█████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 41.1MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 64.4MB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████████████| 411/411 [00:00<00:00, 4.56MB/s]
Validating compatibility...
Converting model...
--- Converting Module: model.layers.0.self_attn.k_proj
Traceback (most recent call last):
  File "/scratch/brr/marlin/conversion/convert.py", line 143, in <module>
    model = convert_model(model).to("cpu")
  File "/scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/scratch/brr/marlin/conversion/convert.py", line 80, in convert_model
    new_module.pack(linear_module, scales=copy.deepcopy(module.scales.data.t()))
  File "/scratch/miniconda3/envs/brr/lib/python3.10/site-packages/marlin/__init__.py", line 117, in pack
    w = torch.round(w / s).int()
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:1!
/scratch/miniconda3/envs/brr/lib/python3.10/tempfile.py:860: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpxyeacbfe'>
  _warnings.warn(warn_message, ResourceWarning)

@robertgshaw2-neuralmagic
Copy link
Author

@rosario-purple just set CUDA_VISIBLE_DEVICES=0, you don't need multiple gpus for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants