Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards

Introduction

This work presents a novel RL-based framework that addresses the sparse reward problem when training diffusion models. Our framework, named $\text{B}^2\text{-DiffuRL}$, employs two strategies: Backward progressive training and Branch-based sampling. For one thing, backward progressive training focuses initially on the final timesteps of denoising process and gradually extends the training interval to earlier timesteps, easing the learning difficulty from sparse rewards. For another, we perform branch-based sampling for each training interval. By comparing the samples within the same branch, we can identify how much the policies of the current training interval contribute to the final image, which helps to learn effective policies instead of unnecessary ones. $\text{B}^2\text{-DiffuRL}$ is compatible with existing optimization algorithms. Extensive experiments demonstrate the effectiveness of $\text{B}^2\text{-DiffuRL}$ in improving prompt-image alignment and maintaining diversity in generated images.

Run

bash run_process.sh > log/exp_B2DiffuRL_b5_p3

This will start fine-tuning, and store the results under model/. The pipeline consists of sampling by run_sample.py, evaluation by run_select.py and training by run_train.py.

The full hyperparameters are shown in config/stage_process.py, and many of them can be modified in run_process.sh. Please note that the default parameters are not meant to achieve best performance.

Acknowlegement

This repository was built with much reference to the following repositories:

Citation

If our work assists your research, feel free to cite us using:

@misc{hu2025betteralignmenttrainingdiffusion,
      title={Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards}, 
      author={Zijing Hu and Fengda Zhang and Long Chen and Kun Kuang and Jiahui Li and Kaifeng Gao and Jun Xiao and Xin Wang and Wenwu Zhu},
      year={2025},
      eprint={2503.11240},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.11240}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
config		config
diffusion		diffusion
utils		utils
LICENSE		LICENSE
README.md		README.md
run_process.sh		run_process.sh
run_sample.py		run_sample.py
run_select.py		run_select.py
run_train.py		run_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards

Introduction

Run

Acknowlegement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards

Introduction

Run

Acknowlegement

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages