Skip to content

It's not a issue, it's to help people reproduce on single 4090 GPU #70

@XuBao12

Description

@XuBao12

I found the configuration could successfully train OSEDiff on one 4090 GPU, which will cost 23.67GB during training and will cost 100h to train for 100k iterations.

Hope it is helpful to those don't have V100 or A100.

  1. --train_batch_size need to set to 1.
  • (optional) --gradient_accumulation_steps=4 to match original batch_size, which is 4.
  1. --gradient_checkpointing should be used (important!!).
  2. --lora_rank could be set up to 32.
export NCCL_P2P_DISABLE="1"
export NCCL_IB_DISABLE="1"

CUDA_VISIBLE_DEVICES=0 accelerate launch train_osediff.py \
    --pretrained_model_name_or_path=preset/models/stable-diffusion-2-1-base \
    --ram_path=preset/models/ram_swin_large_14m.pth \
    --learning_rate=5e-5 \
    --train_batch_size=1 \
    --gradient_accumulation_steps=4 \
    --enable_xformers_memory_efficient_attention --checkpointing_steps 500 \
    --mixed_precision='fp16' \
    --report_to "tensorboard" \
    --seed 123 \
    --output_dir=exps/osediff \
    --dataset_txt_paths_list "path/to/LSDIR" \
    --dataset_prob_paths_list 1 \
    --neg_prompt="painting, oil painting, illustration, drawing, art, sketch, cartoon, CG Style, 3D render, unreal engine, blurring, dirty, messy, worst quality, low quality, frames, watermark, signature, jpeg artifacts, deformed, lowres, over-smooth" \
    --cfg_vsd=7.5 \
    --lora_rank=4 \
    --lambda_lpips=2 \
    --lambda_l2=1 \
    --lambda_vsd=1 \
    --lambda_vsd_lora=1 \
    --deg_file_path="params_realesrgan.yml" \
    --tracker_project_name "train_osediff" \
    --gradient_checkpointing

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions