Skip to content
/ OSPO Public

Official implementation of "OSPO: Object-centric Self-improving Preference Optimization for Text-to-Image Generation"

License

Notifications You must be signed in to change notification settings

KU-AGI/OSPO

Repository files navigation

OSPO

This repository contains the official implementation for the paper
"OSPO: Object-centric Self-improving Preference Optimization for Text-to-Image Generation."

Framework

0. About

OSPO is a self-improving prefernece optimization framework for compositional text-to-image generation, allowing an MLLM to improve its fine-grained image generation capability without needing any data or external model.

TODO

  • Release the model checkpoint.
  • Release the training dataset.
  • Release the inference code for text-to-image generation.
  • Rlease the OSPO framework code.
  • Release the evaluation code.
  • Release the Unitok version code.
  • Release example data for each step.

1. Setup

  1. Create Conda Environment
conda create -n ospo python=3.10 -y
conda activate ospo
  1. Clone this repository
git clone https://github.com/KU-AGI/OSPO.git
cd OSPO
  1. Install Dependencies
conda install pytorch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 pytorch-cuda=12.1 -c pytorch -c nvidia

# Install Dependencies
pip install -r requirements.txt
  1. Download Janus-Pro-7B Model (baseline) from 🤗 HuggingFace
  • Create ./checkpoints directory and place the Model in the ./checkpoints directory.
  • So the Model folder path should be ./checkpoints/Janus-Pro-7B

2. Inference

We provide our different model checkpoints on Hugging Face.

Model Download
OSPO + Janus-Pro-7B 🤗 HuggingFace
OSPO + Janus-1.3B 🤗 HuggingFace
OSPO + Unitok-MLLM-7B 🤗 HuggingFace
  1. Download the model weights from the table.
  2. Place the checkpoint in the ./checkpoints directory.
  1. Run the code.
# OSPO + Janus-Pro-7B
python inference_janus.py --input ${INPUT_TEXT}

# OSPO + Unitok-MLLM-7B
python inference_unitok.py --input ${INPUT_TEXT}

Results will be saved in ./results

3. Reproduce

First of all, you can reproduce only the training step using our provided training dataset. 🤗 HuggingFace

python ospo/step4.py --cfg_path configs/step4.yaml

If you want to reproduce the full OSPO framework, please refer to this page.

4. Evaluate

Please refer to this page.

📍 Acknowledgement

We thank the authors of Janus-Pro, UniTok, SimPO and mDPO, for making their code available.

📍 Citation

@article{oh2025object,
  title={Object-centric Self-improving Preference Optimization for Text-to-Image Generation},
  author={Oh, Yoonjin and Kim, Yongjin and Kim, Hyomin and Chi, Donghwan and Kim, Sungwoong},
  journal={arXiv preprint arXiv:2506.02015},
  year={2025}
}

About

Official implementation of "OSPO: Object-centric Self-improving Preference Optimization for Text-to-Image Generation"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages