This repository contains the official implementation for the paper
"OSPO: Object-centric Self-improving Preference Optimization for Text-to-Image Generation."
OSPO is a self-improving prefernece optimization framework for compositional text-to-image generation, allowing an MLLM to improve its fine-grained image generation capability without needing any data or external model.
- Release the model checkpoint.
- Release the training dataset.
- Release the inference code for text-to-image generation.
- Rlease the OSPO framework code.
- Release the evaluation code.
- Release the Unitok version code.
- Release example data for each step.
- Create Conda Environment
conda create -n ospo python=3.10 -y
conda activate ospo- Clone this repository
git clone https://github.com/KU-AGI/OSPO.git
cd OSPO- Install Dependencies
conda install pytorch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 pytorch-cuda=12.1 -c pytorch -c nvidia
# Install Dependencies
pip install -r requirements.txt- Download Janus-Pro-7B Model (baseline) from 🤗 HuggingFace
- Create
./checkpointsdirectory and place the Model in the./checkpointsdirectory. - So the Model folder path should be
./checkpoints/Janus-Pro-7B
We provide our different model checkpoints on Hugging Face.
| Model | Download |
|---|---|
| OSPO + Janus-Pro-7B | 🤗 HuggingFace |
| OSPO + Janus-1.3B | 🤗 HuggingFace |
| OSPO + Unitok-MLLM-7B | 🤗 HuggingFace |
- Download the model weights from the table.
- Place the checkpoint in the
./checkpointsdirectory.
- Run the code.
# OSPO + Janus-Pro-7B
python inference_janus.py --input ${INPUT_TEXT}
# OSPO + Unitok-MLLM-7B
python inference_unitok.py --input ${INPUT_TEXT}Results will be saved in ./results
First of all, you can reproduce only the training step using our provided training dataset. 🤗 HuggingFace
python ospo/step4.py --cfg_path configs/step4.yamlIf you want to reproduce the full OSPO framework, please refer to this page.
Please refer to this page.
We thank the authors of Janus-Pro, UniTok, SimPO and mDPO, for making their code available.
@article{oh2025object,
title={Object-centric Self-improving Preference Optimization for Text-to-Image Generation},
author={Oh, Yoonjin and Kim, Yongjin and Kim, Hyomin and Chi, Donghwan and Kim, Sungwoong},
journal={arXiv preprint arXiv:2506.02015},
year={2025}
}
