RuntimeError: CUDA error: invalid argument when using xformers #1946

vmajor · 2023-01-07T12:38:55Z

Describe the bug

When trying to run train_dreambooth.py with --enable_xformers_memory_efficient_attention the process exits with this error:

RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Steps:   0%|                                                                                                                          | 0/400 [00:07<?, ?it/s]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/*****/anaconda3/envs/sd-gpu/bin/accelerate:8 in <module>                                  │
│                                                                                                  │
│   5 from accelerate.commands.accelerate_cli import main                                          │
│   6 if __name__ == '__main__':                                                                   │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])                         │
│ ❱ 8 │   sys.exit(main())                                                                         │
│   9                                                                                              │
│                                                                                                  │
│ /home/*****/anaconda3/envs/sd-gpu/lib/python3.10/site-packages/accelerate/commands/accelerate_c │
│ li.py:45 in main                                                                                 │
│                                                                                                  │
│   42 │   │   exit(1)                                                                             │
│   43 │                                                                                           │
│   44 │   # Run                                                                                   │
│ ❱ 45 │   args.func(args)                                                                         │
│   46                                                                                             │
│   47                                                                                             │
│   48 if __name__ == "__main__":                                                                  │
│                                                                                                  │
│ /home/*****/anaconda3/envs/sd-gpu/lib/python3.10/site-packages/accelerate/commands/launch.py:11 │
│ 04 in launch_command                                                                             │
│                                                                                                  │
│   1101 │   elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA  │
│   1102 │   │   sagemaker_launcher(defaults, args)                                                │
│   1103 │   else:                                                                                 │
│ ❱ 1104 │   │   simple_launcher(args)                                                             │
│   1105                                                                                           │
│   1106                                                                                           │
│   1107 def main():                                                                               │
│                                                                                                  │
│ /home/*****/anaconda3/envs/sd-gpu/lib/python3.10/site-packages/accelerate/commands/launch.py:56 │
│ 7 in simple_launcher                                                                             │
│                                                                                                  │
│    564 │   process = subprocess.Popen(cmd, env=current_env)                                      │
│    565 │   process.wait()                                                                        │
│    566 │   if process.returncode != 0:                                                           │
│ ❱  567 │   │   raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)       │
│    568                                                                                           │
│    569                                                                                           │
│    570 def multi_gpu_launcher(args):                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

Reproduction

accelerate launch train_dreambooth.py --pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4 --instance_data_dir=./inputs --output_dir=./outputs --instance_prompt="a photo of sks dog" --resolution=512 --train_batch_size=1 --gradient_accumulation_steps=1 --learning_rate=5e-6 --lr_scheduler="constant" --lr_warmup_steps=0 --max_train_steps=400 --enable_xformers_memory_efficient_attention

Logs

No response

System Info

diffusers version: 0.12.0.dev0
Platform: Linux-5.15.79.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Python version: 3.10.8
PyTorch version (GPU?): 1.13.0 (True)
Huggingface_hub version: 0.11.1
Transformers version: 0.15.0
Accelerate version: not installed
xFormers version: 0.0.15.dev395+git.7e05e2c
Using GPU in script?: yes
Using distributed or parallel set-up in script?: single GPU

The text was updated successfully, but these errors were encountered:

davidpfahler · 2023-01-08T10:18:26Z

This might be an upstream bug in xformers facebookresearch/xformers#563

hafriedlander · 2023-01-09T06:53:46Z

Related issue #1829

hafriedlander · 2023-01-09T06:56:06Z

@davidpfahler in the meantime, using this to enable xformers instead of the built-in enable xformers method should work:

https://github.com/cloneofsimo/lora/blob/master/lora_diffusion/xformers_utils.py#L42

patrickvonplaten · 2023-01-13T11:31:06Z

cc @patil-suraj

patil-suraj · 2023-01-16T15:45:12Z

Could be an issue with xformers version, I have been using the xformers pre-release and it seems to be working without any issues https://pypi.org/project/xformers/#history

TsykunovDmitriy · 2023-01-16T19:05:29Z

Thanks for the tip. I had the same issue. I solved it by installing this xformers pre-release package as @patil-suraj said and updating pytorch version to 1.13.1+cu117.

hafriedlander · 2023-01-17T08:15:50Z

@patil-suraj this is arch specific. What arch are you testing on? It's possible they've fixed it, but the bugs are still open

facebookresearch/xformers#517
facebookresearch/xformers#628

(I'll check latest xformers in a bit, but I already have a fix for myself.)

patil-suraj · 2023-01-17T08:49:57Z

So far, I've only tried it on A100 and T4

hafriedlander · 2023-01-17T09:21:46Z

The two where definitely it works :). The arch I know has issues is SM8x except SM80 (so 30xx and 40xx mostly).

(Although it looks like there's a bit more action in the xformers repo, so this might actually get fixed upstream at some point now.)

USBhost · 2023-01-18T23:54:36Z

Thanks for the tip. I had the same issue. I solved it by installing this xformers pre-release package as @patil-suraj said and updating pytorch version to 1.13.1+cu117.

This worked on my A6000. pytorch 1.13.1 is a must as I installed xformers 436 manually for 1.12.1 and I still got that error.

Edit: it may not error out anymore just it's a silent one now.

gleb-akhmerov · 2023-01-19T14:35:28Z

Thanks for the tip. I had the same issue. I solved it by installing this xformers pre-release package as @patil-suraj said and updating pytorch version to 1.13.1+cu117.

While I'm no longer getting an error, it looks like the model doesn't learn anymore. The images which are generated after the training are the same as before it.

However, I've found an older version of xformers which works just fine: facebookresearch/xformers@0bad001. This seems to be the last commit that works for me, as far as I can tell from a few tests using later commits.

Here's my environment and installation process.

GPU: 3060
CUDA version: 11.8
Python version: 3.10
OS: Arch Linux

Installation:

cd examples/dreambooth
pip install \
    -r requirements.txt \
    git+https://github.com/huggingface/diffusers.git@7c82a16fc14840429566aec40eb9e65aa57005fd \
    torch==1.13.1 \
    bitsandbytes==0.35.1 \
    triton==2.0.0.dev20221202 \
    scikit-learn \
    datasets \
    ninja
pip install git+https://github.com/facebookresearch/xformers.git@0bad001ddd56c080524d37c84ff58d9cd030ebfd

If nvcc is not on $PATH (like on Arch Linux), you can change the last line and specify the path to cuda like this:

PATH="$PATH:/opt/cuda/bin" pip install git+https://github.com/facebookresearch/xformers.git@0bad001ddd56c080524d37c84ff58d9cd030ebfd

Some details about versions:

ninja is installed to build xformers faster
bitsandbytes must be 0.35 because of this. Also, training with 0.35.4 makes the model generate blue noise for me, while 0.35.1 works fine.

Full package version list

absl-py                  1.4.0
accelerate               0.15.0
aiohttp                  3.8.3
aiosignal                1.3.1
async-timeout            4.0.2
attrs                    22.2.0
bitsandbytes             0.35.1
cachetools               5.2.1
certifi                  2022.12.7
charset-normalizer       2.1.1
cmake                    3.25.0
datasets                 2.8.0
diffusers                0.12.0.dev0
dill                     0.3.6
exceptiongroup           1.1.0
filelock                 3.9.0
frozenlist               1.3.3
fsspec                   2022.11.0
ftfy                     6.1.1
google-auth              2.16.0
google-auth-oauthlib     0.4.6
grpcio                   1.51.1
huggingface-hub          0.11.1
idna                     3.4
importlib-metadata       6.0.0
iniconfig                2.0.0
Jinja2                   3.1.2
joblib                   1.2.0
Markdown                 3.4.1
MarkupSafe               2.1.2
modelcards               0.1.6
multidict                6.0.4
multiprocess             0.70.14
mypy-extensions          0.4.3
ninja                    1.11.1
numpy                    1.24.1
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
oauthlib                 3.2.2
packaging                23.0
pandas                   1.5.3
Pillow                   9.4.0
pip                      22.3.1
pluggy                   1.0.0
protobuf                 3.20.3
psutil                   5.9.4
pyarrow                  10.0.1
pyasn1                   0.4.8
pyasn1-modules           0.2.8
pyre-extensions          0.0.23
python-dateutil          2.8.2
pytz                     2022.7.1
PyYAML                   6.0
regex                    2022.10.31
requests                 2.28.2
requests-oauthlib        1.3.1
responses                0.18.0
rsa                      4.9
scikit-learn             1.2.0
scipy                    1.10.0
setuptools               65.5.0
six                      1.16.0
tensorboard              2.11.2
tensorboard-data-server  0.6.1
tensorboard-plugin-wit   1.8.1
threadpoolctl            3.1.0
tokenizers               0.13.2
tomli                    2.0.1
torch                    1.13.1
torchvision              0.14.1
tqdm                     4.64.1
transformers             4.25.1
triton                   2.0.0.dev20221202
typing_extensions        4.4.0
typing-inspect           0.8.0
urllib3                  1.26.14
wcwidth                  0.2.6
Werkzeug                 2.2.2
wheel                    0.38.4
xformers                 0.0.15.dev0+0bad001.d20230119
xxhash                   3.2.0
yarl                     1.8.2
zipp                     3.11.0

Edit: seems to work with both torch 1.12.1 and 1.13.1, updated the version information.

EandrewJones · 2023-02-01T02:29:43Z

I too have been running into issues with xFormers on A10G (aws g5 instance) for training textual inversion, not dreambooth (though same issues would likely apply). The environment is containerized (only showing essential lines below):

FROM nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04

# CUDA xformers build args
ENV TORCH_CUDA_ARCH_LIST = "8.0;8.6"

#
# Deep learning training and inference dependencies
#
RUN pip install -qq -U git+https://github.com/EandrewJones/diffusers
RUN pip install -q -U --pre triton
RUN pip install -q \
    ninja \
    torch==1.13.1 \
    torchvision==0.14.1 \
    accelerate==0.12.0 \
    # mlflow==2.1.1 \
    transformers \
    datasets \
    ftfy \
    pathlib
RUN pip install --upgrade \
    scipy
# RUN conda install -y xformers=0.0.16.dev430+git.bac8718 xformers/label/dev
RUN pip install -v -U git+https://github.com/facebookresearch/xformers.git@0bad001ddd56c080524d37c84ff58d9cd030ebfd
# RUN pip install -v xformers==0.0.17.dev435

Rest of file...

Somewhere between 100-300 steps into training, loss goes to NaN. I know the issue is xFormers because it runs fine w/o it. No C++ errors, just silent failure.

Installations I've tried (pytorch 1.13.1 and cuda 11.6/7 for all):

Every pip release > 0.0.13 (including the one @patil-suraj mentioned above)
Conda install from dev and main (used different base image pytorch/pytorch:1.13.1-cuda11.6-cudnn8-runtime)
Compiling from scratch on both base images (ran into issues mentioned by OP)

The weird thing is python -m xformers.info always indicates success:

memory_efficient_attention.cutlassF:               available
memory_efficient_attention.cutlassB:               available
memory_efficient_attention.flshattF:               available
memory_efficient_attention.flshattB:               available
memory_efficient_attention.smallkF:                available
memory_efficient_attention.smallkB:                available
memory_efficient_attention.tritonflashattF:        available
memory_efficient_attention.tritonflashattB:        available
swiglu.fused.p.cpp:                                available
is_triton_available:                               True
is_functorch_available:                            False
pytorch.version:                                   1.13.1
pytorch.cuda:                                      available
gpu.compute_capability:                            8.6
gpu.name:                                          NVIDIA A10G
build.info:                                        available
build.cuda_version:                                1106
build.python_version:                              3.10.9
build.torch_version:                               1.13.1
build.env.TORCH_CUDA_ARCH_LIST:                    5.0+PTX 6.0 6.1 7.0 7.5 8.0 8.6
build.env.XFORMERS_BUILD_TYPE:                     None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              None
build.env.XFORMERS_PACKAGE_FROM:                   None
source.privacy:                                    open source

patil-suraj · 2023-02-01T08:42:50Z

Would be nice to report this in xformers issues.

patil-suraj · 2023-02-01T08:43:54Z

I don't know it fixes it, but there has been a new release for xformers yesterday https://github.com/facebookresearch/xformers/releases/tag/v0.0.16

EandrewJones · 2023-02-01T14:43:26Z

I tried this and the 0.17 pre-release. I'll report in xformers, but I believe I found a related issue there already. Best Evan Jones Website: www.ea-jones.com

…

On Wed, Feb 1, 2023 at 3:44 AM Suraj Patil ***@***.***> wrote: I don't know it fixes it, but there has been a new release for xformers yesterday https://github.com/facebookresearch/xformers/releases/tag/v0.0.16 — Reply to this email directly, view it on GitHub <#1946 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJ2T6AJ444RUBSQJ2LHQCCTWVIO5LANCNFSM6AAAAAATT5QNQM> . You are receiving this because you commented.Message ID: ***@***.***>

github-actions · 2023-02-25T15:03:50Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

USBhost · 2023-02-25T15:05:48Z

Before the bot makes this issue disappear is it resolved? I'm still currently using an older version of xformers.

EandrewJones · 2023-02-25T20:43:00Z

yes Best Evan Jones Website: www.ea-jones.com

…

On Sat, Feb 25, 2023 at 10:05 AM USBhost ***@***.***> wrote: Before the bot makes this issue disappear is it resolved? — Reply to this email directly, view it on GitHub <#1946 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJ2T6AK273T5XFZL62TQGKLWZINVPANCNFSM6AAAAAATT5QNQM> . You are receiving this because you commented.Message ID: ***@***.***>

github-actions · 2023-03-22T15:03:38Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

vmajor added the bug Something isn't working label Jan 7, 2023

patrickvonplaten assigned patrickvonplaten and patil-suraj and unassigned patrickvonplaten Jan 13, 2023

TsykunovDmitriy mentioned this issue Jan 16, 2023

Return old versions to conda facebookresearch/xformers#646

Closed

github-actions bot added the stale Issues that haven't received updates label Feb 25, 2023

github-actions bot closed this as completed Mar 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA error: invalid argument when using xformers #1946

RuntimeError: CUDA error: invalid argument when using xformers #1946

vmajor commented Jan 7, 2023 •

edited

Loading

davidpfahler commented Jan 8, 2023

hafriedlander commented Jan 9, 2023

hafriedlander commented Jan 9, 2023

patrickvonplaten commented Jan 13, 2023

patil-suraj commented Jan 16, 2023

TsykunovDmitriy commented Jan 16, 2023

hafriedlander commented Jan 17, 2023

patil-suraj commented Jan 17, 2023

hafriedlander commented Jan 17, 2023

USBhost commented Jan 18, 2023 •

edited

Loading

gleb-akhmerov commented Jan 19, 2023 •

edited

Loading

EandrewJones commented Feb 1, 2023 •

edited

Loading

patil-suraj commented Feb 1, 2023

patil-suraj commented Feb 1, 2023

EandrewJones commented Feb 1, 2023 via email

github-actions bot commented Feb 25, 2023

USBhost commented Feb 25, 2023 •

edited

Loading

EandrewJones commented Feb 25, 2023 via email

github-actions bot commented Mar 22, 2023

RuntimeError: CUDA error: invalid argument when using xformers #1946

RuntimeError: CUDA error: invalid argument when using xformers #1946

Comments

vmajor commented Jan 7, 2023 • edited Loading

Describe the bug

Reproduction

Logs

System Info

davidpfahler commented Jan 8, 2023

hafriedlander commented Jan 9, 2023

hafriedlander commented Jan 9, 2023

patrickvonplaten commented Jan 13, 2023

patil-suraj commented Jan 16, 2023

TsykunovDmitriy commented Jan 16, 2023

hafriedlander commented Jan 17, 2023

patil-suraj commented Jan 17, 2023

hafriedlander commented Jan 17, 2023

USBhost commented Jan 18, 2023 • edited Loading

gleb-akhmerov commented Jan 19, 2023 • edited Loading

EandrewJones commented Feb 1, 2023 • edited Loading

patil-suraj commented Feb 1, 2023

patil-suraj commented Feb 1, 2023

EandrewJones commented Feb 1, 2023 via email

github-actions bot commented Feb 25, 2023

USBhost commented Feb 25, 2023 • edited Loading

EandrewJones commented Feb 25, 2023 via email

github-actions bot commented Mar 22, 2023

vmajor commented Jan 7, 2023 •

edited

Loading

USBhost commented Jan 18, 2023 •

edited

Loading

gleb-akhmerov commented Jan 19, 2023 •

edited

Loading

EandrewJones commented Feb 1, 2023 •

edited

Loading

USBhost commented Feb 25, 2023 •

edited

Loading