Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis (Self-Flow) [ICML'26]

Hila Chefer* · Patrick Esser*
Dominik Lorenz · Dustin Podell · Vikash Raja · Vinh Tong · Antonio Torralba · Robin Rombach
Black Forest Labs

This folder contains inference code for generating images with our Self-Flow trained diffusion model on ImageNet 256×256.

Overview

Self-Flow (Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis) is a training framework that combines the flow matching objective with a self-supervised feature reconstruction objective.

This inference code allows you to:

Load a Self-Flow checkpoints (pretrained on ImageNet 256x256)
Generate 50,000 images for FID evaluation

The generated samples can be evaluated using the ADM evaluation suite.

Requirements

pip install -r requirements.txt

Quick Start

Download Checkpoint

python -c "
from huggingface_hub import hf_hub_download
hf_hub_download(
    repo_id='Hila/Self-Flow',
    filename='selfflow_imagenet256.pt',
    local_dir='./checkpoints'
)
print('Downloaded!')
"

Generate 50k samples (multi-GPU recommended)

torchrun --nnodes=1 --nproc_per_node=8 sample.py \
    --ckpt checkpoints/selfflow_imagenet256.pt \
    --output-dir ./samples \
    --num-fid-samples 50000

Single GPU

python sample.py \
    --ckpt checkpoints/selfflow_imagenet256.pt \
    --output-dir ./samples \
    --num-fid-samples 50000 \
    --batch-size 64

Command Line Arguments

Argument	Default	Description
`--ckpt`	required	Path to model checkpoint
`--output-dir`	`./samples`	Output directory for generated samples
`--num-fid-samples`	`50000`	Number of samples to generate
`--batch-size`	`64`	Batch size per GPU
`--num-steps`	`250`	Number of diffusion sampling steps
`--mode`	`SDE`	Sampling mode: `SDE` or `ODE`
`--seed`	`31`	Random seed for reproducibility
`--cfg-scale`	`1.0`	Classifier-free guidance scale (1.0 = no guidance, as used in paper)

Evaluation

The generated .npz file can be used with the ADM evaluation suite to compute FID, IS, Precision, and Recall.

Download Reference Statistics

wget https://openaipublic.blob.core.windows.net/diffusion/jul-2021/ref_batches/imagenet/256/VIRTUAL_imagenet256_labeled.npz

Run Evaluation

python evaluator.py \
    VIRTUAL_imagenet256_labeled.npz \
    ./samples/samples_50000.npz ./samples

Model Architecture

The Self-Flow model is based on SiT-XL/2 with the following specifications

A key architectural modification is per-token timestep conditioning, which allows each token to have a different noise level during training.

Project Structure

Self-Flow/
├── sample.py           # Main sampling script
├── checkpoints/        # Place model checkpoints here
├── requirements.txt    # Python dependencies
├── README.md           # This file
└── src/                # Model and sampling implementations
    ├── model.py        # SelfFlowPerTokenDiT model
    ├── sampling.py     # Diffusion sampling utilities
    └── utils.py        # Position encoding utilities

Training Details

The model was trained using the following configuration:

Model: SiT-XL/2 with per-token timestep conditioning
Training: Self-Flow with per-token masking (25% mask ratio)
Optimizer: AdamW with gradient clipping (max_norm=1)
Mixed precision: BFloat16
Self-distillation: Teacher at layer 20 (EMA), student at layer 8

Acknowledgments

This code builds upon:

REPA - Representation Alignment for Generation
SiT - Scalable Interpolant Transformers

BibTeX

If you use this work, please cite:

@article{CheferEsser2026selfflow,
      title={Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis}, 
      author={Hila Chefer and Patrick Esser and Dominik Lorenz and Dustin Podell and Vikash Raja and Vinh Tong and Antonio Torralba and Robin Rombach},
      journal = {arXiv preprint arXiv:2603.06507},
      year={2026},
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
src		src
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
sample.py		sample.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis (Self-Flow) [ICML'26]

Overview

Requirements

Quick Start

Download Checkpoint

Generate 50k samples (multi-GPU recommended)

Single GPU

Command Line Arguments

Evaluation

Download Reference Statistics

Run Evaluation

Model Architecture

Project Structure

Training Details

Acknowledgments

BibTeX

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis (Self-Flow) [ICML'26]

Overview

Requirements

Quick Start

Download Checkpoint

Generate 50k samples (multi-GPU recommended)

Single GPU

Command Line Arguments

Evaluation

Download Reference Statistics

Run Evaluation

Model Architecture

Project Structure

Training Details

Acknowledgments

BibTeX

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages