I tried to reproduce this work, only simply modifying the running code, but the results continued to be relatively blurry (left) compared to the official weight (right) inference. I'm not sure if it's because of the different datasets, which I used DF2K, while this paper used LSDIR.
# Two-CPUs distributed execution
CUDA_VISIBLE_DEVICES="0,1" accelerate launch train_osediff.py \
--pretrained_model_name_or_path 'stable-diffusion-2-1-base' \
--ram_path 'preset/models/ram_swin_large_14m.pth' \
--learning_rate 2.5e-5 \
--train_batch_size 2 \
--resolution 512 \
--gradient_accumulation_steps 1 \
--checkpointing_steps 500 \
--mixed_precision 'fp16' \
--report_to "tensorboard" \
--seed 123 \
--output_dir 'experiments/experiments_scratch_512/osediff' \
--dataset_txt_paths_list 'datasets/DF2K/DF2K_HR_train_p512' \
--dataset_prob_paths_list 1 \
--neg_prompt "painting, oil painting, illustration, drawing, art, sketch, cartoon, CG Style, 3D render, unreal engine, blurring, dirty, messy, worst quality, low quality, frames, watermark, signature, jpeg artifacts, deformed, lowres, over-smooth" \
--cfg_vsd 7.5 \
--lora_rank 4 \
--lambda_lpips 2 \
--lambda_l2 1 \
--lambda_vsd 1 \
--lambda_vsd_lora 1 \
--deg_file_path "params_realesrgan.yml" \
--tracker_project_name "train_osediff_scratch_512" \
--gradient_checkpointing \
--enable_xformers_memory_efficient_attention \
--dataloader_num_workers 16
I tried to reproduce this work, only simply modifying the running code, but the results continued to be relatively blurry (left) compared to the official weight (right) inference. I'm not sure if it's because of the different datasets, which I used DF2K, while this paper used LSDIR.