I found the configuration could successfully train OSEDiff on one 4090 GPU, which will cost 23.67GB during training and will cost 100h to train for 100k iterations.
Hope it is helpful to those don't have V100 or A100.
- --train_batch_size need to set to 1.
- (optional) --gradient_accumulation_steps=4 to match original batch_size, which is 4.
- --gradient_checkpointing should be used (important!!).
- --lora_rank could be set up to 32.
export NCCL_P2P_DISABLE="1"
export NCCL_IB_DISABLE="1"
CUDA_VISIBLE_DEVICES=0 accelerate launch train_osediff.py \
--pretrained_model_name_or_path=preset/models/stable-diffusion-2-1-base \
--ram_path=preset/models/ram_swin_large_14m.pth \
--learning_rate=5e-5 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--enable_xformers_memory_efficient_attention --checkpointing_steps 500 \
--mixed_precision='fp16' \
--report_to "tensorboard" \
--seed 123 \
--output_dir=exps/osediff \
--dataset_txt_paths_list "path/to/LSDIR" \
--dataset_prob_paths_list 1 \
--neg_prompt="painting, oil painting, illustration, drawing, art, sketch, cartoon, CG Style, 3D render, unreal engine, blurring, dirty, messy, worst quality, low quality, frames, watermark, signature, jpeg artifacts, deformed, lowres, over-smooth" \
--cfg_vsd=7.5 \
--lora_rank=4 \
--lambda_lpips=2 \
--lambda_l2=1 \
--lambda_vsd=1 \
--lambda_vsd_lora=1 \
--deg_file_path="params_realesrgan.yml" \
--tracker_project_name "train_osediff" \
--gradient_checkpointing
I found the configuration could successfully train OSEDiff on one 4090 GPU, which will cost 23.67GB during training and will cost 100h to train for 100k iterations.
Hope it is helpful to those don't have V100 or A100.