OSPO

This repository contains the official implementation for the paper
"OSPO: Object-centric Self-improving Preference Optimization for Text-to-Image Generation."

0. About

OSPO is a self-improving prefernece optimization framework for compositional text-to-image generation, allowing an MLLM to improve its fine-grained image generation capability without needing any data or external model.

TODO

Release the model checkpoint.
Release the training dataset.
Release the inference code for text-to-image generation.
Rlease the OSPO framework code.
Release the evaluation code.
Release the Unitok version code.
Release example data for each step.

1. Setup

Create Conda Environment

conda create -n ospo python=3.10 -y
conda activate ospo

Clone this repository

git clone https://github.com/KU-AGI/OSPO.git
cd OSPO

Install Dependencies

conda install pytorch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 pytorch-cuda=12.1 -c pytorch -c nvidia

# Install Dependencies
pip install -r requirements.txt

Download Janus-Pro-7B Model (baseline) from 🤗 HuggingFace

Create ./checkpoints directory and place the Model in the ./checkpoints directory.
So the Model folder path should be ./checkpoints/Janus-Pro-7B

2. Inference

We provide our different model checkpoints on Hugging Face.

Model	Download
OSPO + Janus-Pro-7B	🤗 HuggingFace
OSPO + Janus-1.3B	🤗 HuggingFace
OSPO + Unitok-MLLM-7B	🤗 HuggingFace

Download the model weights from the table.
Place the checkpoint in the ./checkpoints directory.

Run the code.

# OSPO + Janus-Pro-7B
python inference_janus.py --input ${INPUT_TEXT}

# OSPO + Unitok-MLLM-7B
python inference_unitok.py --input ${INPUT_TEXT}

Results will be saved in ./results

3. Reproduce

First of all, you can reproduce only the training step using our provided training dataset. 🤗 HuggingFace

python ospo/step4.py --cfg_path configs/step4.yaml

If you want to reproduce the full OSPO framework, please refer to this page.

4. Evaluate

Please refer to this page.

📍 Acknowledgement

We thank the authors of Janus-Pro, UniTok, SimPO and mDPO, for making their code available.

📍 Citation

@article{oh2025object,
  title={Object-centric Self-improving Preference Optimization for Text-to-Image Generation},
  author={Oh, Yoonjin and Kim, Yongjin and Kim, Hyomin and Chi, Donghwan and Kim, Sungwoong},
  journal={arXiv preprint arXiv:2506.02015},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ablation		ablation
assets		assets
configs		configs
eval		eval
examples/step1		examples/step1
iter1_ckpt		iter1_ckpt
janus		janus
ospo		ospo
scripts		scripts
unitok		unitok
validation		validation
.gitignore		.gitignore
.project-root		.project-root
LICENSE		LICENSE
README.md		README.md
inference_janus.py		inference_janus.py
inference_unitok.py		inference_unitok.py
next		next
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OSPO

0. About

TODO

1. Setup

2. Inference

3. Reproduce

4. Evaluate

📍 Acknowledgement

📍 Citation

About

Uh oh!

Releases

Packages

Languages

License

KU-AGI/OSPO

Folders and files

Latest commit

History

Repository files navigation

OSPO

0. About

TODO

1. Setup

2. Inference

3. Reproduce

4. Evaluate

📍 Acknowledgement

📍 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages