Dreambooth inference result is not correct when training with xformer #1954

mingqizhang · 2023-01-09T11:04:06Z

I train dreambooth with xformer but the result is not correct (It’s correct without xformer). The model initialize from runwayml/stable-diffusion-v1-5, but the inference result is the same as the runwayml/stable-diffusion-v1-5. Maybe there are some errors when saving the model with xformer? I also add the pipe.enable_xformers_memory_efficient_attention() when inference, but not work.

patrickvonplaten · 2023-01-13T12:21:49Z

Hey @mingqizhang,

It's quite difficult to act on this issue as there is no code to reproduce what is incorrect when using xformers. Could you try to add a reproducible code snippet or create a google colab?

cc @patil-suraj as well

patil-suraj · 2023-01-16T15:15:06Z

Same comment as Patrick's would be nice if you could post something so we could re-produce.

TsykunovDmitriy · 2023-01-16T20:23:35Z

I have the same problem. I printed out the gradients and this is really equal to nan when set --enable_xformers_memory_efficient_attention. I think this is related to this issue facebookresearch/xformers#631.

My env:
GPU: 3090
Pytorch: 1.13.1+cu117
xformers: 0.0.16rc425 (from pypi)

patil-suraj · 2023-01-17T09:10:11Z

From the linked issue, it seems like they are using bfloat16 for training; not sure if bfloat16 works well with PyTorch stable diffusion. I'm using xformers==0.0.16rc396 and it's working well.

arpowers · 2023-01-18T01:57:16Z

@patrickvonplaten @patil-suraj this problem is occuring when using latest Xformers installed via the new method:
pip install --pre -U xformers

aside from that, the issue will occur when running the vanilla dreambooth example from diffusers

patrickvonplaten · 2023-01-22T19:40:27Z

Think we would need to have a fully reproducible code snippet here. Also as @patil-suraj said, bfloat16 is not the standard precision for dreambooth.

dai-ichiro · 2023-01-24T07:39:57Z

I have the same problem.

My Env is
Ubuntu 20.04 on WSL2 (WIndows 11)
RTX 3080
Python 3.8.10
Pytorch: 1.13.1+cu116

pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
pip install git+https://github.com/huggingface/diffusers.git
pip install git+https://github.com/huggingface/transformers.git
pip install accelerate==0.15.0 scipy==1.10.0 datasets==2.8.0 ftfy==6.1.1 tensorboard==2.11.2
pip install xformers==0.0.16rc425
pip install triton==2.0.0.dev20221202

accelerate config

------------------------------------------------------------------------------------------------------------------------
In which compute environment are you running?
This machine
------------------------------------------------------------------------------------------------------------------------
Which type of machine are you using?
No distributed training
Do you want to run your training on CPU only (even if a GPU is available)? [yes/NO]:NO
Do you wish to optimize your script with torch dynamo?[yes/NO]:NO
Do you want to use DeepSpeed? [yes/NO]: NO
What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:all
------------------------------------------------------------------------------------------------------------------------
Do you wish to use FP16 or BF16 (mixed precision)?
fp16

train

export MODEL_NAME="stable-diffusion-v1-4"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="dreambooth_dog"

accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="a photo of sks dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=100\
  --enable_xformers_memory_efficient_attention

inference

from diffusers import StableDiffusionPipeline
import torch

seed = 10000
prompt = "a photo of sks dog"

# original
model_id = 'stable-diffusion-v1-4'
pipe = StableDiffusionPipeline.from_pretrained(
    model_id, 
    torch_dtype=torch.float16).to("cuda")

generator = torch.Generator(device="cuda").manual_seed(seed)
image = pipe(
    prompt = prompt, 
    num_inference_steps = 25,
    generator = generator,
    num_images_per_prompt = 1).images[0]
image.save(f'{model_id}.png')

# finetune
model_id = 'dreambooth_dog'
pipe = StableDiffusionPipeline.from_pretrained(
    model_id, 
    torch_dtype=torch.float16).to("cuda")

generator = torch.Generator(device="cuda").manual_seed(seed)
image = pipe(
    prompt = prompt, 
    num_inference_steps = 25,
    generator = generator,
    num_images_per_prompt = 1).images[0]
image.save(f'{model_id}.png')

The two results are exactly the same.

patrickvonplaten · 2023-01-24T16:05:07Z

cc @patil-suraj could you take a look here?

patil-suraj · 2023-01-25T10:04:17Z

I couldn't reproduce the issue. It's working fine for me. I tried the same command as you, with the same dependencies version (accelerate, xformers, triton) and with torch==1.13.1, diffusers main, and I got these results after running both the fine-tuned and sd-1.4 model.

The first row is sd-1.4, and the second is the fine-tuned model. And the model is learned and is working well.

Also, in your example, you are using 100 max_train_steps, which I feel is too low. The issue could be because of too few steps or too small LR.

dai-ichiro · 2023-01-25T23:19:53Z

Thank you for your quick response.

I changed max_train_steps and LR but had no changes.
I think there is something wrong with my environment or settings.
OS?
CUDA Driver?

I will try different settings.
Sorry for bothering you.

patil-suraj · 2023-01-26T08:14:41Z

No worries at all, maybe try in colab or setup a fesh env, that would help with env issue.

mingqizhang · 2023-01-28T10:08:56Z

Maybe you can reference to #1829 , sd 1.5 is not work well in 3090/A10, some layers in unet can not backpropagation with xofrmer.

dai-ichiro · 2023-02-02T16:28:10Z

I installed xformers==0.0.17.dev441 via pip, and my problem is solved.

Thanks.

rajbala · 2023-02-16T14:40:20Z

@patrickvonplaten @patil-suraj this problem is occuring when using latest Xformers installed via the new method: pip install --pre -U xformers

aside from that, the issue will occur when running the vanilla dreambooth example from diffusers

@patrickvonplaten @patil-suraj this problem is occuring when using latest Xformers installed via the new method: pip install --pre -U xformers

Did you find a solution to the issue with even the vanilla dreambooth example?

SilentD1 · 2023-02-22T11:17:40Z

And how do you change the xformers version for stable diffusion webui specifically, cause when I install it using that command, it seems to install it inside python but the version in the sstable diffusion loading screen keeps saying 0.0.16rc425 installed.

github-actions · 2023-03-18T15:03:00Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

patrickvonplaten assigned patil-suraj Jan 24, 2023

shirayu mentioned this issue Jan 24, 2023

Example of training LoRA kohya-ss/sd-scripts#85

Closed

ye7iaserag mentioned this issue Feb 4, 2023

Training has no effect on inference result d8ahazard/sd_dreambooth_extension#897

Closed

github-actions bot added the stale Issues that haven't received updates label Mar 18, 2023

github-actions bot closed this as completed Mar 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dreambooth inference result is not correct when training with xformer #1954

Dreambooth inference result is not correct when training with xformer #1954

mingqizhang commented Jan 9, 2023

patrickvonplaten commented Jan 13, 2023

patil-suraj commented Jan 16, 2023

TsykunovDmitriy commented Jan 16, 2023 •

edited

Loading

patil-suraj commented Jan 17, 2023

arpowers commented Jan 18, 2023

patrickvonplaten commented Jan 22, 2023

dai-ichiro commented Jan 24, 2023

patrickvonplaten commented Jan 24, 2023

patil-suraj commented Jan 25, 2023

dai-ichiro commented Jan 25, 2023

patil-suraj commented Jan 26, 2023

mingqizhang commented Jan 28, 2023

dai-ichiro commented Feb 2, 2023

rajbala commented Feb 16, 2023

SilentD1 commented Feb 22, 2023 •

edited

Loading

github-actions bot commented Mar 18, 2023

Dreambooth inference result is not correct when training with xformer #1954

Dreambooth inference result is not correct when training with xformer #1954

Comments

mingqizhang commented Jan 9, 2023

patrickvonplaten commented Jan 13, 2023

patil-suraj commented Jan 16, 2023

TsykunovDmitriy commented Jan 16, 2023 • edited Loading

patil-suraj commented Jan 17, 2023

arpowers commented Jan 18, 2023

patrickvonplaten commented Jan 22, 2023

dai-ichiro commented Jan 24, 2023

train

inference

patrickvonplaten commented Jan 24, 2023

patil-suraj commented Jan 25, 2023

dai-ichiro commented Jan 25, 2023

patil-suraj commented Jan 26, 2023

mingqizhang commented Jan 28, 2023

dai-ichiro commented Feb 2, 2023

rajbala commented Feb 16, 2023

SilentD1 commented Feb 22, 2023 • edited Loading

github-actions bot commented Mar 18, 2023

TsykunovDmitriy commented Jan 16, 2023 •

edited

Loading

SilentD1 commented Feb 22, 2023 •

edited

Loading