Yushun Fang1,2,* Yuxiang Chen2,* Shibo Yin2,† Qiang Hu1,† Jiangchao Yao1
Ya Zhang1 Xiaoyun Zhang1,† Yanfeng Wang1
1Shanghai Jiao Tong University 2Xiaohongshu Inc
*Equal contribution †Corresponding author
- 2025.11.25: The paper (including supplementary materials) is released on arXiv.
- 2025.11.21: Code is released.
⭐ If ODTSR is helpful to you, please help star this repo. Thanks!
- Based on Qwen-Image, we train a single-step SR model using LoRA, with the model parameters reaching 20B.
- With our proposed Noise-hybrid Visual Stream and Fidelity-aware Adversarial Training, the SR process can be jointly controlled by prompts as well as a Fidelity Weight
$f$ . - English and Chinese prompts are supported, and the model demonstrates strong performance in text images, fine-grained textures and face images.
Under the high-fidelity setting with a fixed prompt, our model produces restorations that adhere more closely to the LQ input while remaining natural, significantly reducing the sense of AI processing.
In text scenarios, when the prompt specifies the text to be restored, the model automatically matches the LQ text and performs the restoration.Qualitative results of controllable SR with prompt and adjustable Fidelity Weight (denoted as
- Prepare conda env:
conda create -n yourenv python=3.11
- Install
pytorch(we recommendtorch==2.6.0):
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 -f https://mirrors.aliyun.com/pytorch-wheels/cu124/
- Install this repo (based on DiffSynth-Studio). The required packages will be automatically installed:
cd xxxx/ODTSR
pip3 install -e . -v -i https://mirrors.cloud.tencent.com/pypi/simple
- (For training) Install
basicsr:
pip install basicsr
Note:
You can apply the the following command to fix a bug in basicsr. Make sure to replace /opt/conda with the path to your own conda environment:
sed -i '8s/from torchvision.transforms.functional_tensor import rgb_to_grayscale/from torchvision.transforms._functional_tensor import rgb_to_grayscale/' /opt/conda/lib/python3.11/site-packages/basicsr/data/degradations.py
-
Download base model to your disk: Qwen-Image
-
(For training) Download base model to your disk: Wan2.1-T2V-1.3B
-
(For inference) Download the trained ODTSR model weight: huggingface
Note: you need at least 40GB GPU memory to infer. We will support CPU offload to reduce GPU memory usage soon.
We now supports tile-based processing (tile size: 512×512), enabling input of arbitrary resolutions and SR at any scale factor.
Please replace experiments/qwen_one_step_gan/${EXP_DATE}/checkpoints/net_gen_iter_10001.pth with the trained ODTSR model weight.
sh examples/qwen_image/test_gan.sh
sh examples/qwen_image/test_gradio.sh
To be updated.
This project is released under the Apache 2.0 license.
This project is based on DiffSynth-Studio. We also leveraged some of PiSA-SR's code in dataloader part. Thanks for the awesome work!





