Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
This work presents a novel RL-based framework that addresses the sparse reward problem when training diffusion models. Our framework, named
bash run_process.sh > log/exp_B2DiffuRL_b5_p3This will start fine-tuning, and store the results under model/. The pipeline consists of sampling by run_sample.py, evaluation by run_select.py and training by run_train.py.
The full hyperparameters are shown in config/stage_process.py, and many of them can be modified in run_process.sh. Please note that the default parameters are not meant to achieve best performance.
This repository was built with much reference to the following repositories:
If our work assists your research, feel free to cite us using:
@misc{hu2025betteralignmenttrainingdiffusion,
title={Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards},
author={Zijing Hu and Fengda Zhang and Long Chen and Kun Kuang and Jiahui Li and Kaifeng Gao and Jun Xiao and Xin Wang and Wenwu Zhu},
year={2025},
eprint={2503.11240},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.11240},
}
