VisualPrompter

Official implementation of paper "VisualPrompter: Semantic-Aware Prompt Optimization with Visual Feedback for Text-to-Image Synthesis", accepted at ICLR 2026.

VisualPrompter is a training‑free prompt engineering framework that automatically refines user prompts to better align with text‑to‑image models. It operates at the atomic semantic level: a self‑reflection module (SERE) identifies missing concepts by analysing generated images, and a target‑specific optimisation module (TSPO) expands only those concepts while preserving the original intent. The result is a semantically faithful prompt and produced image with higher fidelity to the user’s description.

Installation

Clone the repository and install the required dependencies:

pip install -r requirements.txt

Model Checkpoints

VisualPrompter uses the following open‑source models internally:

LLM: Qwen2.5 14B (or other size) for DSG generation and prompt rewriting.
VLM: Qwen2‑VL 7B for visual question answering.
Generative Models: Stable Diffusion v1.5 / v2.1, Fluxe-dev, and Janus-pro.

Please download them from Hugging Face, or let the scripts load them automatically.

Evaluation

We provide evaluation scripts for two benchmarks: DSG‑1k and TIFA v1.0.

Please first check the scripts and then run the full evaluation pipeline with:

bash scripts/eval_dsg.sh 
bash scripts/eval_tifa.sh

The scripts will generate images using the original prompts and the prompts optimised by VisualPrompter, then compute semantic accuracy via the VLM judge.

Quick Demo

To test VisualPrompter on a single prompt, use the provided demo:

streamlit run scripts/demo/demo.py

This will generate an image with your chosen models both before and after optimisation, and display the results together.

Citation

If you find this work useful, please cite our paper:

@inproceedings{wu2026vp, 
    title={VisualPrompter: Semantic-Aware Prompt Optimization with Visual Feedback for Text-to-Image Synthesis}, 
    author={Shiyu Wu and Mingzhen Sun and Weining Wang and Yequan Wang and Jing Liu}, 
    booktitle={International Conference on Learning Representations (ICLR)}, 
    year={2026} 
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
dsg		dsg
eval		eval
model		model
scripts		scripts
evaluate_gqa.py		evaluate_gqa.py
evaluate_score.py		evaluate_score.py
pipeline.py		pipeline.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VisualPrompter

Installation

Model Checkpoints

Evaluation

Quick Demo

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VisualPrompter

Installation

Model Checkpoints

Evaluation

Quick Demo

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages