Skip to content

Programmergg/FAPE-IR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration

visitors GitHub Stars

[Paper (arXiv)]   Jingren Liu*, Shuning Xu*, Qirui Yang*, Yun Wang, Xiangyu Chen, Zhong Ji
*Equal contribution   Corresponding author


This repo is an early exploration of unified image restoration (unified understanding & generation). We are actively investigating more lightweight designs and alternatives beyond the MLLM + Diffusion paradigm, and will continuously maintain and update this repo with new progress.

The three authors are struggling with their doctoral dissertations. Please wait a moment while they work on cleaning the code and providing new directions and suggestions.

🚀 Future Exploration

Beyond the current release, we would like to continue exploring the following directions:

  1. Smaller foundation models for practical deployment.
    One important direction is to build more lightweight unified restoration models based on smaller foundation models, such as models around the 1.3B scale, or even models suitable for on-device deployment. We believe this is essential for bringing unified low-level restoration from research prototypes to scenarios with real practical value, where efficiency, memory, and deployment cost are critical.

  2. Understanding and mitigating artifacts in flow-matching-based low-level restoration.
    In our experiments on low-level image and video restoration, we observe a very important phenomenon: although model families such as WAN and FLUX can work well for cross-modal generation, they often suffer from severe artifacts under the image-to-image paradigm or video-to-video paradigm required by low-level restoration. This issue appears to be a common problem in large low-level diffusion / flow-matching-style models, and we believe it deserves much deeper investigation. Understanding why these artifacts emerge, and how to suppress them without sacrificing generation quality, will be one of our major future focuses.
    More specifically, taking recent video foundation models for super-resolution as an example, current practice often relies on a two-stage pipeline: first performing flow-matching pretraining, and then applying an additional post-training stage such as Adversarial Post-Training (APT) or Distribution Matching Distillation (DMD) to obtain better restoration quality. In our view, this paradigm is still not aesthetically satisfying from a modeling perspective. A more fundamental question is: why can we not obtain the best restoration performance directly from flow-matching training itself?
    We believe this question is highly important for the future of low-level generative restoration. If high-quality low-level restoration always depends on extra post-training, refinement, or distillation stages, then the overall framework becomes less unified, less elegant, and harder to analyze. In contrast, achieving strong results directly from the original FM objective would be much cleaner and more principled. This may require better low-level objectives, better noise / trajectory design, or restoration-oriented training strategies specifically tailored to image-to-image and video-to-video settings.

  3. Unified autoregressive and unified diffusion paradigms for low-level vision.
    Another important direction is to explore whether low-level image and video restoration can be realized in a more truly unified autoregressive manner, as well as in a more principled unified diffusion framework. We hope to study how these two paradigms can better support unified low-level understanding, planning, and generation, and whether they can offer cleaner, simpler, and more scalable solutions for restoration tasks.

  4. Combining low-level restoration, unified models, and reinforcement learning.
    We are also interested in exploring how low-level restoration, unified foundation models, and reinforcement learning can be connected together. We believe this may open up new opportunities for better long-horizon optimization, adaptive restoration strategies, and more intelligent decision-making in both image and video restoration systems.

We will continue maintaining this repository and update it as these directions become more mature.


🚩 New Features/Updates

  • ✅ Nov 25, 2025. Release the arXiv paper.
  • ✅ Apr 10, 2026. Release training and testing code.
  • ✅ Apr 12, 2026. Initial weights are now available on Hugging Face.
  • 🚧 TBD. Release pretrained checkpoints & model zoo.
  • 🚧 TBD. Release standalone inference / evaluation scripts and example results.

Please star this repo to get updates.


📖 Resources

Checkpoints / Datasets / Results

Item Link
Initial weights Hugging Face - FAPEIR_Uniworld
Pretrained checkpoints / model zoo TBD
Trainset (GT/LQ) Hugging Face - FAPE-IR-Training
Testset (GT/LQ) Hugging Face - FAPE-IR-Testing

💻 Usage

➡️ Environment

conda create -n fapeir python=3.11 -y
conda activate fapeir

# Please install dependencies according to your local CUDA / PyTorch setup.
pip install -r requirements.txt

➡️ Prepare Initial Weights

Before training, please first create a weights directory in the project root and place all released initial weights into it.

mkdir -p weights

A recommended layout is:

FAPE-IR/
├── weights/
│   ├── flux/
│   ├── siglip/
│   ├── uniworld/
│   ├── denoise_projector_params.bin
│   ├── flux-redux-siglipv2-512.bin
│   ├── vae_projector_only.bin
│   └── vgg.pth
├── scripts/
├── fapeir/
├── train.py
└── validation.py

You can download the initial weights from:

https://huggingface.co/David0219/FAPEIR_Uniworld

➡️ Modify Weight Paths in scripts/

After placing the initial weights into weights/, please modify the relevant paths in:

scripts/denoiser/flux_qwen2p5vl_7b_vlm_512.yaml

Please update the path-related fields as follows:

training_config:
  output_dir: trained_weights/fapeir
  logging_dir: trained_weights/fapeir
  lpips_weights_path: weights/vgg.pth

model_config:
  pretrained_lvlm_name_or_path: weights/uniworld
  pretrained_denoiser_name_or_path: weights/flux
  pretrained_mlp2_path: weights/denoise_projector_params.bin
  pretrained_mlp3_path: weights/vae_projector_only.bin
  pretrained_siglip_name_or_path: weights/siglip
  pretrained_siglip_mlp_path: weights/flux-redux-siglipv2-512.bin

If your local file layout is different, please modify the YAML paths accordingly. The important point is that all fields in the config must match your actual local file locations.

If you use the shell launchers under scripts/denoiser/, please also modify your local machine-dependent variables there, such as:

  • PY
  • ENV_BIN
  • LOG_DIR
  • SESSION_NAME
  • MASTER_ADDR
  • MASTER_PORT
  • MACHINE_RANK
  • NUM_MACHINES

➡️ Training

After the initial weights are prepared and the YAML paths are updated, you can start training.

Single-node training

python -m accelerate.commands.launch \
  --config_file scripts/accelerate_configs/single_node_zero2.yaml \
  train.py \
  scripts/denoiser/flux_qwen2p5vl_7b_vlm_512.yaml

Multi-node training

python -m accelerate.commands.launch \
  --config_file scripts/accelerate_configs/multi_node_zero2.yaml \
  --num_machines <NUM_MACHINES> \
  --machine_rank <MACHINE_RANK> \
  --main_process_ip <MASTER_ADDR> \
  --main_process_port <MASTER_PORT> \
  train.py \
  scripts/denoiser/flux_qwen2p5vl_7b_vlm_512.yaml

Using shell launchers

If you want to launch training with the provided shell scripts, please first update your local paths and runtime variables in scripts/denoiser/*.sh, and then run your corresponding launcher.

➡️ Inference / Evaluation

Standalone inference and evaluation scripts are still being cleaned up and will be released in future updates.


💾 Data

➡️ Overview

The current public release uses:

  • a manifest file (datasets/train.txt) to describe the training sources
  • a JSON annotation file for each dataset subset
  • a unified testing root under datasets/sr_testing_data/

A recommended directory layout is:

FAPE-IR/
├── datasets/
│   ├── train.txt
│   ├── train_data/
│   │   ├── images/
│   │   └── annotations/
│   │       ├── train_task1.json
│   │       ├── train_task2.json
│   │       └── ...
│   └── sr_testing_data/
│       ├── DrealSR/
│       ├── RealSR/
│       ├── weather1/
│       ├── weather2/
│       ├── Snow100K-L/
│       ├── Snow100K-S/
│       ├── dehazing_test/
│       ├── LOL2/
│       ├── RealBlur_J/
│       ├── RealBlur_R/
│       ├── GoPro/
│       ├── Urban100_15/
│       ├── Urban100_25/
│       └── Urban100_50/
├── weights/
├── scripts/
└── ...

You may use a different local layout, but then you must modify the corresponding fields in scripts/denoiser/flux_qwen2p5vl_7b_vlm_512.yaml.


➡️ Training Manifest Format

The training manifest file is:

datasets/train.txt

Each line describes one dataset source. The supported formats are:

image_root,json_file,need_weight

or

image_root,json_file,need_weight,need_degradation

Example:

datasets/train_data/images,datasets/train_data/annotations/train_task1.json,true,true
datasets/train_data/images,datasets/train_data/annotations/train_task2.json,false,false

Field meanings:

  • image_root: root directory of images referenced by the JSON file
  • json_file: annotation file for this subset
  • need_weight: whether to enable the corresponding weighting behavior
  • need_degradation: whether to synthesize low-quality input from the target image on the fly

If need_degradation is omitted, the current loader treats it as true.


➡️ JSON Annotation Format

Each JSON file should be a list of samples.

Case 1: Degradation-based training

In this mode, the image is treated as the target image. The loader will randomly crop it to 512×512 and generate the low-quality input on the fly.

[
  {
    "image": "gt/sample_0001.png"
  },
  {
    "image": "gt/sample_0002.png"
  }
]

In this case:

  • the image is treated as the GT image
  • the low-quality input is synthesized during training

Case 2: Paired training

In this mode, the first image is treated as the input image and the second image is treated as the GT image.

[
  {
    "image": ["lq/sample_0001.png", "gt/sample_0001.png"]
  },
  {
    "image": ["lq/sample_0002.png", "gt/sample_0002.png"]
  }
]

In this case:

  • the first image is treated as the input image
  • the second image is treated as the GT image

➡️ Image Resolution Handling

For the current public training pipeline:

  • degradation-based training uses random cropping and degradation synthesis
  • paired training resizes both input and GT images to 512×512
  • the training config currently uses:
    • height: 512
    • width: 512

If you change these settings, please keep your data preparation and GPU memory budget consistent.


➡️ Validation / Test Data

The current config expects validation / test data under datasets/sr_testing_data/.

The default path-related fields in scripts/denoiser/flux_qwen2p5vl_7b_vlm_512.yaml are:

dataset_config:
  data_txt: datasets/train.txt

  # DrealSR
  test_data_root: datasets/sr_testing_data/DrealSR
  test_scales: [2, 4]

  # Weather
  weather_test_data_root: datasets/sr_testing_data
  weather_test_types:
    - weather1
    - weather2
    - Snow100K-L
    - Snow100K-S

  # RealSR
  realsr_test_data_root: datasets/sr_testing_data/RealSR
  realsr_camera_types:
    - Canon
    - Nikon
  realsr_scale_factors:
    - 2
    - 4

  # Generic benchmarks
  generic_test_data_root: datasets/sr_testing_data
  generic_datasset_types:
    - dehazing_test
    - LOL2
    - RealBlur_J
    - RealBlur_R
    - GoPro
    - Urban100_15
    - Urban100_25
    - Urban100_50

Note: please keep the key name generic_datasset_types unchanged in the current release, since it is intentionally kept for compatibility with the existing code.

In practice, each benchmark should be placed under the corresponding root path, and the YAML should be edited if your local folder names differ.


➡️ Recommended Workflow

A practical workflow is:

  1. Prepare the initial weights under weights/
  2. Modify the path fields in scripts/denoiser/flux_qwen2p5vl_7b_vlm_512.yaml
  3. Prepare datasets/train.txt
  4. Prepare the corresponding JSON annotation files
  5. Organize validation / test data under datasets/sr_testing_data/
  6. Start training with the provided Accelerate config

📊 Results

Quantitative Results


Figure 1. Quantitative comparison with state-of-the-art AIO-IR methods on six tasks (deraining, denoising, deblurring, desnowing, dehazing, low-light enhancement). Best in red, second-best in blue.


Figure 2. Unified comparison across SR task series. Best in red, second-best in blue.

Qualitative Results


Figure 6. Qualitative comparison among unified models, including BAGEL, Nexus-Gen, Uniworld-V1, and Emu3.5.


Figure 7. Qualitative comparison of restoration results produced by FAPE-IR and state-of-the-art AIO-IR models.


Citation

If you use this work, please cite:

@article{liu2025fape,
  title={FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration},
  author={Liu, Jingren and Xu, Shuning and Yang, Qirui and Wang, Yun and Chen, Xiangyu and Ji, Zhong},
  journal={arXiv preprint arXiv:2511.14099},
  year={2025}
}

Acknowledgement

We thank all collaborators and colleagues for their helpful discussions and support. We especially thank Dr. Xiangyu Chen and Professor Zhong Ji for their guidance and revisions to this work.

This project is built upon several foundational open-source works. We sincerely thank the authors of the following repositories for their invaluable contributions to the community:

  • Core Inspiration & Architecture:
    • UniWorld: Greatly inspired our unified understanding and generation paradigm.
    • Janus: Provided valuable insights into decoupled visual encoding.
  • Baselines & Comparisons: We gratefully acknowledge the authors of BAGEL, Nexus-Gen, and Emu3.5 for open-sourcing their code and weights, which facilitated our qualitative and quantitative comparisons.
  • Frameworks & Utilities: Our implementation relies on excellent open-source libraries, including Hugging Face's diffusers and transformers.

Contact

If you have any questions, feel free to reach out:

About

Repo for FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration (CVPR2026)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors