Skip to content

teheperinko541/VisualPrompter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VisualPrompter

Official implementation of paper "VisualPrompter: Semantic-Aware Prompt Optimization with Visual Feedback for Text-to-Image Synthesis", accepted at ICLR 2026.

VisualPrompter is a training‑free prompt engineering framework that automatically refines user prompts to better align with text‑to‑image models. It operates at the atomic semantic level: a self‑reflection module (SERE) identifies missing concepts by analysing generated images, and a target‑specific optimisation module (TSPO) expands only those concepts while preserving the original intent. The result is a semantically faithful prompt and produced image with higher fidelity to the user’s description.

Interface Preview

Installation

Clone the repository and install the required dependencies:

pip install -r requirements.txt 

Model Checkpoints

VisualPrompter uses the following open‑source models internally:

  • LLM: Qwen2.5 14B (or other size) for DSG generation and prompt rewriting.
  • VLM: Qwen2‑VL 7B for visual question answering.
  • Generative Models: Stable Diffusion v1.5 / v2.1, Fluxe-dev, and Janus-pro.

Please download them from Hugging Face, or let the scripts load them automatically.

Evaluation

We provide evaluation scripts for two benchmarks: DSG‑1k and TIFA v1.0.

Please first check the scripts and then run the full evaluation pipeline with:

bash scripts/eval_dsg.sh 
bash scripts/eval_tifa.sh

The scripts will generate images using the original prompts and the prompts optimised by VisualPrompter, then compute semantic accuracy via the VLM judge.

Quick Demo

To test VisualPrompter on a single prompt, use the provided demo:

streamlit run scripts/demo/demo.py 

This will generate an image with your chosen models both before and after optimisation, and display the results together.

Citation

If you find this work useful, please cite our paper:

@inproceedings{wu2026vp, 
    title={VisualPrompter: Semantic-Aware Prompt Optimization with Visual Feedback for Text-to-Image Synthesis}, 
    author={Shiyu Wu and Mingzhen Sun and Weining Wang and Yequan Wang and Jing Liu}, 
    booktitle={International Conference on Learning Representations (ICLR)}, 
    year={2026} 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors