GitHub - Programmergg/FAPE-IR: Repo for FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration (CVPR2026)

FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration

[Paper (arXiv)] Jingren Liu^*, Shuning Xu^*, Qirui Yang^*, Yun Wang, Xiangyu Chen^✉, Zhong Ji^✉
^*Equal contribution ^✉Corresponding author

This repo is an early exploration of unified image restoration (unified understanding & generation). We are actively investigating more lightweight designs and alternatives beyond the MLLM + Diffusion paradigm, and will continuously maintain and update this repo with new progress.

The three authors are struggling with their doctoral dissertations. Please wait a moment while they work on cleaning the code and providing new directions and suggestions.

🚀 Future Exploration

Beyond the current release, we would like to continue exploring the following directions:

Smaller foundation models for practical deployment.
One important direction is to build more lightweight unified restoration models based on smaller foundation models, such as models around the 1.3B scale, or even models suitable for on-device deployment. We believe this is essential for bringing unified low-level restoration from research prototypes to scenarios with real practical value, where efficiency, memory, and deployment cost are critical.
Understanding and mitigating artifacts in flow-matching-based low-level restoration.
In our experiments on low-level image and video restoration, we observe a very important phenomenon: although model families such as WAN and FLUX can work well for cross-modal generation, they often suffer from severe artifacts under the image-to-image paradigm or video-to-video paradigm required by low-level restoration. This issue appears to be a common problem in large low-level diffusion / flow-matching-style models, and we believe it deserves much deeper investigation. Understanding why these artifacts emerge, and how to suppress them without sacrificing generation quality, will be one of our major future focuses.
More specifically, taking recent video foundation models for super-resolution as an example, current practice often relies on a two-stage pipeline: first performing flow-matching pretraining, and then applying an additional post-training stage such as Adversarial Post-Training (APT) or Distribution Matching Distillation (DMD) to obtain better restoration quality. In our view, this paradigm is still not aesthetically satisfying from a modeling perspective. A more fundamental question is: why can we not obtain the best restoration performance directly from flow-matching training itself?
We believe this question is highly important for the future of low-level generative restoration. If high-quality low-level restoration always depends on extra post-training, refinement, or distillation stages, then the overall framework becomes less unified, less elegant, and harder to analyze. In contrast, achieving strong results directly from the original FM objective would be much cleaner and more principled. This may require better low-level objectives, better noise / trajectory design, or restoration-oriented training strategies specifically tailored to image-to-image and video-to-video settings.
Unified autoregressive and unified diffusion paradigms for low-level vision.
Another important direction is to explore whether low-level image and video restoration can be realized in a more truly unified autoregressive manner, as well as in a more principled unified diffusion framework. We hope to study how these two paradigms can better support unified low-level understanding, planning, and generation, and whether they can offer cleaner, simpler, and more scalable solutions for restoration tasks.
Combining low-level restoration, unified models, and reinforcement learning.
We are also interested in exploring how low-level restoration, unified foundation models, and reinforcement learning can be connected together. We believe this may open up new opportunities for better long-horizon optimization, adaptive restoration strategies, and more intelligent decision-making in both image and video restoration systems.

We will continue maintaining this repository and update it as these directions become more mature.

🚩 New Features/Updates

✅ Nov 25, 2025. Release the arXiv paper.
✅ Apr 10, 2026. Release training and testing code.
✅ Apr 12, 2026. Initial weights are now available on Hugging Face.
🚧 TBD. Release pretrained checkpoints & model zoo.
🚧 TBD. Release standalone inference / evaluation scripts and example results.

Please star this repo to get updates.

📖 Resources

Checkpoints / Datasets / Results

Item	Link
Initial weights	Hugging Face - FAPEIR_Uniworld
Pretrained checkpoints / model zoo	TBD
Trainset (GT/LQ)	Hugging Face - FAPE-IR-Training
Testset (GT/LQ)	Hugging Face - FAPE-IR-Testing

💻 Usage

➡️ Environment

conda create -n fapeir python=3.11 -y
conda activate fapeir

# Please install dependencies according to your local CUDA / PyTorch setup.
pip install -r requirements.txt

➡️ Prepare Initial Weights

Before training, please first create a weights directory in the project root and place all released initial weights into it.

mkdir -p weights

A recommended layout is:

FAPE-IR/
├── weights/
│   ├── flux/
│   ├── siglip/
│   ├── uniworld/
│   ├── denoise_projector_params.bin
│   ├── flux-redux-siglipv2-512.bin
│   ├── vae_projector_only.bin
│   └── vgg.pth
├── scripts/
├── fapeir/
├── train.py
└── validation.py

You can download the initial weights from:

https://huggingface.co/David0219/FAPEIR_Uniworld

➡️ Modify Weight Paths in `scripts/`

After placing the initial weights into weights/, please modify the relevant paths in:

scripts/denoiser/flux_qwen2p5vl_7b_vlm_512.yaml

Please update the path-related fields as follows:

training_config:
  output_dir: trained_weights/fapeir
  logging_dir: trained_weights/fapeir
  lpips_weights_path: weights/vgg.pth

model_config:
  pretrained_lvlm_name_or_path: weights/uniworld
  pretrained_denoiser_name_or_path: weights/flux
  pretrained_mlp2_path: weights/denoise_projector_params.bin
  pretrained_mlp3_path: weights/vae_projector_only.bin
  pretrained_siglip_name_or_path: weights/siglip
  pretrained_siglip_mlp_path: weights/flux-redux-siglipv2-512.bin

If your local file layout is different, please modify the YAML paths accordingly. The important point is that all fields in the config must match your actual local file locations.

If you use the shell launchers under scripts/denoiser/, please also modify your local machine-dependent variables there, such as:

PY
ENV_BIN
LOG_DIR
SESSION_NAME
MASTER_ADDR
MASTER_PORT
MACHINE_RANK
NUM_MACHINES

➡️ Training

After the initial weights are prepared and the YAML paths are updated, you can start training.

Single-node training

python -m accelerate.commands.launch \
  --config_file scripts/accelerate_configs/single_node_zero2.yaml \
  train.py \
  scripts/denoiser/flux_qwen2p5vl_7b_vlm_512.yaml

Multi-node training

python -m accelerate.commands.launch \
  --config_file scripts/accelerate_configs/multi_node_zero2.yaml \
  --num_machines <NUM_MACHINES> \
  --machine_rank <MACHINE_RANK> \
  --main_process_ip <MASTER_ADDR> \
  --main_process_port <MASTER_PORT> \
  train.py \
  scripts/denoiser/flux_qwen2p5vl_7b_vlm_512.yaml

Using shell launchers

If you want to launch training with the provided shell scripts, please first update your local paths and runtime variables in scripts/denoiser/*.sh, and then run your corresponding launcher.

➡️ Inference / Evaluation

Standalone inference and evaluation scripts are still being cleaned up and will be released in future updates.

💾 Data

➡️ Overview

The current public release uses:

a manifest file (datasets/train.txt) to describe the training sources
a JSON annotation file for each dataset subset
a unified testing root under datasets/sr_testing_data/

A recommended directory layout is:

FAPE-IR/
├── datasets/
│   ├── train.txt
│   ├── train_data/
│   │   ├── images/
│   │   └── annotations/
│   │       ├── train_task1.json
│   │       ├── train_task2.json
│   │       └── ...
│   └── sr_testing_data/
│       ├── DrealSR/
│       ├── RealSR/
│       ├── weather1/
│       ├── weather2/
│       ├── Snow100K-L/
│       ├── Snow100K-S/
│       ├── dehazing_test/
│       ├── LOL2/
│       ├── RealBlur_J/
│       ├── RealBlur_R/
│       ├── GoPro/
│       ├── Urban100_15/
│       ├── Urban100_25/
│       └── Urban100_50/
├── weights/
├── scripts/
└── ...

You may use a different local layout, but then you must modify the corresponding fields in scripts/denoiser/flux_qwen2p5vl_7b_vlm_512.yaml.

➡️ Training Manifest Format

The training manifest file is:

datasets/train.txt

Each line describes one dataset source. The supported formats are:

image_root,json_file,need_weight

or

image_root,json_file,need_weight,need_degradation

Example:

datasets/train_data/images,datasets/train_data/annotations/train_task1.json,true,true
datasets/train_data/images,datasets/train_data/annotations/train_task2.json,false,false

Field meanings:

image_root: root directory of images referenced by the JSON file
json_file: annotation file for this subset
need_weight: whether to enable the corresponding weighting behavior
need_degradation: whether to synthesize low-quality input from the target image on the fly

If need_degradation is omitted, the current loader treats it as true.

➡️ JSON Annotation Format

Each JSON file should be a list of samples.

Case 1: Degradation-based training

In this mode, the image is treated as the target image. The loader will randomly crop it to 512×512 and generate the low-quality input on the fly.

[
  {
    "image": "gt/sample_0001.png"
  },
  {
    "image": "gt/sample_0002.png"
  }
]

In this case:

the image is treated as the GT image
the low-quality input is synthesized during training

Case 2: Paired training

In this mode, the first image is treated as the input image and the second image is treated as the GT image.

[
  {
    "image": ["lq/sample_0001.png", "gt/sample_0001.png"]
  },
  {
    "image": ["lq/sample_0002.png", "gt/sample_0002.png"]
  }
]

In this case:

the first image is treated as the input image
the second image is treated as the GT image

➡️ Image Resolution Handling

For the current public training pipeline:

degradation-based training uses random cropping and degradation synthesis
paired training resizes both input and GT images to 512×512
the training config currently uses:
- height: 512
- width: 512

If you change these settings, please keep your data preparation and GPU memory budget consistent.

➡️ Validation / Test Data

The current config expects validation / test data under datasets/sr_testing_data/.

The default path-related fields in scripts/denoiser/flux_qwen2p5vl_7b_vlm_512.yaml are:

dataset_config:
  data_txt: datasets/train.txt

  # DrealSR
  test_data_root: datasets/sr_testing_data/DrealSR
  test_scales: [2, 4]

  # Weather
  weather_test_data_root: datasets/sr_testing_data
  weather_test_types:
    - weather1
    - weather2
    - Snow100K-L
    - Snow100K-S

  # RealSR
  realsr_test_data_root: datasets/sr_testing_data/RealSR
  realsr_camera_types:
    - Canon
    - Nikon
  realsr_scale_factors:
    - 2
    - 4

  # Generic benchmarks
  generic_test_data_root: datasets/sr_testing_data
  generic_datasset_types:
    - dehazing_test
    - LOL2
    - RealBlur_J
    - RealBlur_R
    - GoPro
    - Urban100_15
    - Urban100_25
    - Urban100_50

Note: please keep the key name generic_datasset_types unchanged in the current release, since it is intentionally kept for compatibility with the existing code.

In practice, each benchmark should be placed under the corresponding root path, and the YAML should be edited if your local folder names differ.

➡️ Recommended Workflow

A practical workflow is:

Prepare the initial weights under weights/
Modify the path fields in scripts/denoiser/flux_qwen2p5vl_7b_vlm_512.yaml
Prepare datasets/train.txt
Prepare the corresponding JSON annotation files
Organize validation / test data under datasets/sr_testing_data/
Start training with the provided Accelerate config

📊 Results

Quantitative Results

Figure 1. Quantitative comparison with state-of-the-art AIO-IR methods on six tasks (deraining, denoising, deblurring, desnowing, dehazing, low-light enhancement). Best in red, second-best in blue.

Figure 2. Unified comparison across SR task series. Best in red, second-best in blue.

Qualitative Results

Figure 6. Qualitative comparison among unified models, including BAGEL, Nexus-Gen, Uniworld-V1, and Emu3.5.

Figure 7. Qualitative comparison of restoration results produced by FAPE-IR and state-of-the-art AIO-IR models.

Citation

If you use this work, please cite:

@article{liu2025fape,
  title={FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration},
  author={Liu, Jingren and Xu, Shuning and Yang, Qirui and Wang, Yun and Chen, Xiangyu and Ji, Zhong},
  journal={arXiv preprint arXiv:2511.14099},
  year={2025}
}

Acknowledgement

We thank all collaborators and colleagues for their helpful discussions and support. We especially thank Dr. Xiangyu Chen and Professor Zhong Ji for their guidance and revisions to this work.

This project is built upon several foundational open-source works. We sincerely thank the authors of the following repositories for their invaluable contributions to the community:

Core Inspiration & Architecture:
- UniWorld: Greatly inspired our unified understanding and generation paradigm.
- Janus: Provided valuable insights into decoupled visual encoding.
Baselines & Comparisons: We gratefully acknowledge the authors of BAGEL, Nexus-Gen, and Emu3.5 for open-sourcing their code and weights, which facilitated our qualitative and quantitative comparisons.
Frameworks & Utilities: Our implementation relies on excellent open-source libraries, including Hugging Face's diffusers and transformers.

Contact

If you have any questions, feel free to reach out:

Email: jrl0219@tju.edu.cn; yc07425@um.edu.mo

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
fapeir		fapeir
figs		figs
scripts		scripts
LICENSE		LICENSE
README.md		README.md
requirement.txt		requirement.txt
train.py		train.py
validation.py		validation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration

🚀 Future Exploration

🚩 New Features/Updates

📖 Resources

Checkpoints / Datasets / Results

💻 Usage

➡️ Environment

➡️ Prepare Initial Weights

➡️ Modify Weight Paths in `scripts/`

➡️ Training

➡️ Inference / Evaluation

💾 Data

➡️ Overview

➡️ Training Manifest Format

➡️ JSON Annotation Format

Case 1: Degradation-based training

Case 2: Paired training

➡️ Image Resolution Handling

➡️ Validation / Test Data

➡️ Recommended Workflow

📊 Results

Quantitative Results

Qualitative Results

Citation

Acknowledgement

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration

🚀 Future Exploration

🚩 New Features/Updates

📖 Resources

Checkpoints / Datasets / Results

💻 Usage

➡️ Environment

➡️ Prepare Initial Weights

➡️ Modify Weight Paths in scripts/

➡️ Training

➡️ Inference / Evaluation

💾 Data

➡️ Overview

➡️ Training Manifest Format

➡️ JSON Annotation Format

Case 1: Degradation-based training

Case 2: Paired training

➡️ Image Resolution Handling

➡️ Validation / Test Data

➡️ Recommended Workflow

📊 Results

Quantitative Results

Qualitative Results

Citation

Acknowledgement

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

➡️ Modify Weight Paths in `scripts/`

Packages