[Paper (arXiv)] Jingren Liu*, Shuning Xu*, Qirui Yang*, Yun Wang, Xiangyu Chen✉, Zhong Ji✉
*Equal contribution ✉Corresponding author
This repo is an early exploration of unified image restoration (unified understanding & generation). We are actively investigating more lightweight designs and alternatives beyond the MLLM + Diffusion paradigm, and will continuously maintain and update this repo with new progress.
The three authors are struggling with their doctoral dissertations. Please wait a moment while they work on cleaning the code and providing new directions and suggestions.
Beyond the current release, we would like to continue exploring the following directions:
-
Smaller foundation models for practical deployment.
One important direction is to build more lightweight unified restoration models based on smaller foundation models, such as models around the 1.3B scale, or even models suitable for on-device deployment. We believe this is essential for bringing unified low-level restoration from research prototypes to scenarios with real practical value, where efficiency, memory, and deployment cost are critical. -
Understanding and mitigating artifacts in flow-matching-based low-level restoration.
In our experiments on low-level image and video restoration, we observe a very important phenomenon: although model families such as WAN and FLUX can work well for cross-modal generation, they often suffer from severe artifacts under the image-to-image paradigm or video-to-video paradigm required by low-level restoration. This issue appears to be a common problem in large low-level diffusion / flow-matching-style models, and we believe it deserves much deeper investigation. Understanding why these artifacts emerge, and how to suppress them without sacrificing generation quality, will be one of our major future focuses.
More specifically, taking recent video foundation models for super-resolution as an example, current practice often relies on a two-stage pipeline: first performing flow-matching pretraining, and then applying an additional post-training stage such as Adversarial Post-Training (APT) or Distribution Matching Distillation (DMD) to obtain better restoration quality. In our view, this paradigm is still not aesthetically satisfying from a modeling perspective. A more fundamental question is: why can we not obtain the best restoration performance directly from flow-matching training itself?
We believe this question is highly important for the future of low-level generative restoration. If high-quality low-level restoration always depends on extra post-training, refinement, or distillation stages, then the overall framework becomes less unified, less elegant, and harder to analyze. In contrast, achieving strong results directly from the original FM objective would be much cleaner and more principled. This may require better low-level objectives, better noise / trajectory design, or restoration-oriented training strategies specifically tailored to image-to-image and video-to-video settings. -
Unified autoregressive and unified diffusion paradigms for low-level vision.
Another important direction is to explore whether low-level image and video restoration can be realized in a more truly unified autoregressive manner, as well as in a more principled unified diffusion framework. We hope to study how these two paradigms can better support unified low-level understanding, planning, and generation, and whether they can offer cleaner, simpler, and more scalable solutions for restoration tasks. -
Combining low-level restoration, unified models, and reinforcement learning.
We are also interested in exploring how low-level restoration, unified foundation models, and reinforcement learning can be connected together. We believe this may open up new opportunities for better long-horizon optimization, adaptive restoration strategies, and more intelligent decision-making in both image and video restoration systems.
We will continue maintaining this repository and update it as these directions become more mature.
- ✅ Nov 25, 2025. Release the arXiv paper.
- ✅ Apr 10, 2026. Release training and testing code.
- ✅ Apr 12, 2026. Initial weights are now available on Hugging Face.
- 🚧 TBD. Release pretrained checkpoints & model zoo.
- 🚧 TBD. Release standalone inference / evaluation scripts and example results.
Please star this repo to get updates.
| Item | Link |
|---|---|
| Initial weights | Hugging Face - FAPEIR_Uniworld |
| Pretrained checkpoints / model zoo | TBD |
| Trainset (GT/LQ) | Hugging Face - FAPE-IR-Training |
| Testset (GT/LQ) | Hugging Face - FAPE-IR-Testing |
conda create -n fapeir python=3.11 -y
conda activate fapeir
# Please install dependencies according to your local CUDA / PyTorch setup.
pip install -r requirements.txtBefore training, please first create a weights directory in the project root and place all released initial weights into it.
mkdir -p weightsA recommended layout is:
FAPE-IR/
├── weights/
│ ├── flux/
│ ├── siglip/
│ ├── uniworld/
│ ├── denoise_projector_params.bin
│ ├── flux-redux-siglipv2-512.bin
│ ├── vae_projector_only.bin
│ └── vgg.pth
├── scripts/
├── fapeir/
├── train.py
└── validation.py
You can download the initial weights from:
https://huggingface.co/David0219/FAPEIR_Uniworld
After placing the initial weights into weights/, please modify the relevant paths in:
scripts/denoiser/flux_qwen2p5vl_7b_vlm_512.yaml
Please update the path-related fields as follows:
training_config:
output_dir: trained_weights/fapeir
logging_dir: trained_weights/fapeir
lpips_weights_path: weights/vgg.pth
model_config:
pretrained_lvlm_name_or_path: weights/uniworld
pretrained_denoiser_name_or_path: weights/flux
pretrained_mlp2_path: weights/denoise_projector_params.bin
pretrained_mlp3_path: weights/vae_projector_only.bin
pretrained_siglip_name_or_path: weights/siglip
pretrained_siglip_mlp_path: weights/flux-redux-siglipv2-512.binIf your local file layout is different, please modify the YAML paths accordingly. The important point is that all fields in the config must match your actual local file locations.
If you use the shell launchers under scripts/denoiser/, please also modify your local machine-dependent variables there, such as:
PYENV_BINLOG_DIRSESSION_NAMEMASTER_ADDRMASTER_PORTMACHINE_RANKNUM_MACHINES
After the initial weights are prepared and the YAML paths are updated, you can start training.
Single-node training
python -m accelerate.commands.launch \
--config_file scripts/accelerate_configs/single_node_zero2.yaml \
train.py \
scripts/denoiser/flux_qwen2p5vl_7b_vlm_512.yamlMulti-node training
python -m accelerate.commands.launch \
--config_file scripts/accelerate_configs/multi_node_zero2.yaml \
--num_machines <NUM_MACHINES> \
--machine_rank <MACHINE_RANK> \
--main_process_ip <MASTER_ADDR> \
--main_process_port <MASTER_PORT> \
train.py \
scripts/denoiser/flux_qwen2p5vl_7b_vlm_512.yamlUsing shell launchers
If you want to launch training with the provided shell scripts, please first update your local paths and runtime variables in scripts/denoiser/*.sh, and then run your corresponding launcher.
Standalone inference and evaluation scripts are still being cleaned up and will be released in future updates.
The current public release uses:
- a manifest file (
datasets/train.txt) to describe the training sources - a JSON annotation file for each dataset subset
- a unified testing root under
datasets/sr_testing_data/
A recommended directory layout is:
FAPE-IR/
├── datasets/
│ ├── train.txt
│ ├── train_data/
│ │ ├── images/
│ │ └── annotations/
│ │ ├── train_task1.json
│ │ ├── train_task2.json
│ │ └── ...
│ └── sr_testing_data/
│ ├── DrealSR/
│ ├── RealSR/
│ ├── weather1/
│ ├── weather2/
│ ├── Snow100K-L/
│ ├── Snow100K-S/
│ ├── dehazing_test/
│ ├── LOL2/
│ ├── RealBlur_J/
│ ├── RealBlur_R/
│ ├── GoPro/
│ ├── Urban100_15/
│ ├── Urban100_25/
│ └── Urban100_50/
├── weights/
├── scripts/
└── ...
You may use a different local layout, but then you must modify the corresponding fields in scripts/denoiser/flux_qwen2p5vl_7b_vlm_512.yaml.
The training manifest file is:
datasets/train.txt
Each line describes one dataset source. The supported formats are:
image_root,json_file,need_weight
or
image_root,json_file,need_weight,need_degradation
Example:
datasets/train_data/images,datasets/train_data/annotations/train_task1.json,true,true
datasets/train_data/images,datasets/train_data/annotations/train_task2.json,false,false
Field meanings:
image_root: root directory of images referenced by the JSON filejson_file: annotation file for this subsetneed_weight: whether to enable the corresponding weighting behaviorneed_degradation: whether to synthesize low-quality input from the target image on the fly
If need_degradation is omitted, the current loader treats it as true.
Each JSON file should be a list of samples.
In this mode, the image is treated as the target image. The loader will randomly crop it to 512×512 and generate the low-quality input on the fly.
[
{
"image": "gt/sample_0001.png"
},
{
"image": "gt/sample_0002.png"
}
]In this case:
- the image is treated as the GT image
- the low-quality input is synthesized during training
In this mode, the first image is treated as the input image and the second image is treated as the GT image.
[
{
"image": ["lq/sample_0001.png", "gt/sample_0001.png"]
},
{
"image": ["lq/sample_0002.png", "gt/sample_0002.png"]
}
]In this case:
- the first image is treated as the input image
- the second image is treated as the GT image
For the current public training pipeline:
- degradation-based training uses random cropping and degradation synthesis
- paired training resizes both input and GT images to
512×512 - the training config currently uses:
height: 512width: 512
If you change these settings, please keep your data preparation and GPU memory budget consistent.
The current config expects validation / test data under datasets/sr_testing_data/.
The default path-related fields in scripts/denoiser/flux_qwen2p5vl_7b_vlm_512.yaml are:
dataset_config:
data_txt: datasets/train.txt
# DrealSR
test_data_root: datasets/sr_testing_data/DrealSR
test_scales: [2, 4]
# Weather
weather_test_data_root: datasets/sr_testing_data
weather_test_types:
- weather1
- weather2
- Snow100K-L
- Snow100K-S
# RealSR
realsr_test_data_root: datasets/sr_testing_data/RealSR
realsr_camera_types:
- Canon
- Nikon
realsr_scale_factors:
- 2
- 4
# Generic benchmarks
generic_test_data_root: datasets/sr_testing_data
generic_datasset_types:
- dehazing_test
- LOL2
- RealBlur_J
- RealBlur_R
- GoPro
- Urban100_15
- Urban100_25
- Urban100_50Note: please keep the key name
generic_datasset_typesunchanged in the current release, since it is intentionally kept for compatibility with the existing code.
In practice, each benchmark should be placed under the corresponding root path, and the YAML should be edited if your local folder names differ.
A practical workflow is:
- Prepare the initial weights under
weights/ - Modify the path fields in
scripts/denoiser/flux_qwen2p5vl_7b_vlm_512.yaml - Prepare
datasets/train.txt - Prepare the corresponding JSON annotation files
- Organize validation / test data under
datasets/sr_testing_data/ - Start training with the provided Accelerate config
Figure 1. Quantitative comparison with state-of-the-art AIO-IR methods on six tasks (deraining, denoising, deblurring, desnowing, dehazing, low-light enhancement). Best in red, second-best in blue.
Figure 2. Unified comparison across SR task series. Best in red, second-best in blue.
Figure 6. Qualitative comparison among unified models, including BAGEL, Nexus-Gen, Uniworld-V1, and Emu3.5.
Figure 7. Qualitative comparison of restoration results produced by FAPE-IR and state-of-the-art AIO-IR models.
If you use this work, please cite:
@article{liu2025fape,
title={FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration},
author={Liu, Jingren and Xu, Shuning and Yang, Qirui and Wang, Yun and Chen, Xiangyu and Ji, Zhong},
journal={arXiv preprint arXiv:2511.14099},
year={2025}
}We thank all collaborators and colleagues for their helpful discussions and support. We especially thank Dr. Xiangyu Chen and Professor Zhong Ji for their guidance and revisions to this work.
This project is built upon several foundational open-source works. We sincerely thank the authors of the following repositories for their invaluable contributions to the community:
- Core Inspiration & Architecture:
- Baselines & Comparisons: We gratefully acknowledge the authors of BAGEL, Nexus-Gen, and Emu3.5 for open-sourcing their code and weights, which facilitated our qualitative and quantitative comparisons.
- Frameworks & Utilities: Our implementation relies on excellent open-source libraries, including Hugging Face's diffusers and transformers.
If you have any questions, feel free to reach out:
- Email: jrl0219@tju.edu.cn; yc07425@um.edu.mo
