Diffusion Models Experiments

This repository contains experiments and educational implementations of diffusion-based generative models using PyTorch.

The goal is to understand how diffusion models work by implementing them from scratch and visualizing their behavior.

Implementations

DDPM (Denoising Diffusion Probabilistic Model)

Notebook: DDPM/DDPM.ipynb

This notebook implements a diffusion model trained on the MNIST dataset to generate handwritten digits.

Main features:

U-Net architecture for noise prediction
Sinusoidal timestep embeddings
Exponential Moving Average (EMA) for stable sampling
Visualization of forward and reverse diffusion

The model learns to generate digits by starting from random noise and gradually denoising it.

LDM (Latent Diffusion Model)

Notebook: LDM/LDM.ipynb

This notebook implements a Latent Diffusion Model, where diffusion is performed in a compressed latent space instead of directly on image pixels.

The pipeline consists of two stages:

Autoencoder training – learns a latent representation of the images.
Diffusion training in latent space – a U-Net learns to denoise latent representations.

Operating in latent space significantly reduces computational cost and memory usage, while still allowing high-quality image generation.

Main features:

Convolutional Autoencoder for latent compression
Latent-space diffusion training
U-Net architecture for noise prediction
EMA sampling
Generation of new images by decoding denoised latent vectors

Generated Samples

ViT (Vision Transformer)

Notebook: ViT/ViT.ipynb

This notebook implements a Vision Transformer (ViT) for image classification.

Instead of using convolutional layers, the model processes images as a sequence of patch embeddings and applies the Transformer encoder architecture.

Main features:

Image patch embedding
Positional embeddings
Transformer encoder blocks
Multi-head self-attention
Classification head

The implementation demonstrates how transformer architectures can be applied to computer vision tasks.

ControlNet

Notebook: ControlNet/ControlNet.ipynb

This notebook demonstrates ControlNet — a method for adding spatial conditioning to a pre-trained Stable Diffusion model. Instead of generating images purely from text, ControlNet allows guiding generation using structural signals such as Canny edge maps.

The notebook uses a reference car image, extracts its Canny edges, and feeds the edge map alongside text prompts into a ControlNet-conditioned Stable Diffusion pipeline.

Main features:

Canny edge extraction as a structural conditioning signal
Stable Diffusion v1.5 + lllyasviel/control_v11p_sd15_canny ControlNet
UniPCMultistepScheduler for fast, high-quality sampling
Exploring different text prompts while preserving the source structure
Sweeping ControlNet conditioning scale (0.1 → 1.0) to analyze structure adherence

Generated Samples

LoRA (Low-Rank Adaptation)

Notebook: LoRA/LoRA.ipynb

This notebook implements LoRA — a parameter-efficient fine-tuning technique that injects trainable low-rank matrices into a pre-trained network's weight layers, leaving the original weights frozen.

The experiment trains an intentionally large MLP (ExpensiveNet, ~2.8 M parameters) to classify MNIST digits, then fine-tunes it on the digit 9 — on which the baseline performs worst — using only the LoRA parameters.

The effective weight update follows:

W_eff = W + B · A · (α / r)

where A and B are the small trainable matrices, r is the rank, and α is a scaling constant.

Main features:

Custom LoRAParametrization module using PyTorch's torch.nn.utils.parametrize API
Random Gaussian init for A, zero init for B — ensures Δ W = 0 at training start
α / r scaling to decouple rank from effective learning rate
Per-layer parameter counting (original vs. LoRA-added)
Selective freezing: only LoRA matrices are updated during fine-tuning
Integrity checks asserting original weights are unchanged post fine-tuning
enable_disable_lora() toggle to switch between adapted and original model at inference

img2dataset (Image Collection)

Scripts: img2dataset/dataset.py, img2dataset/list.txt

A minimal image collection pipeline for bootstrapping local datasets using the img2dataset library.

The script reads a list of image URLs from list.txt, downloads them in parallel, resizes each image to 256×256 using border padding, and saves the results to a local data/ directory. The output directory is wiped on every run to ensure a clean dataset state.

Main features:

URL-driven collection via a plain list.txt file
Automatic cleanup of existing output before each run
Resize to 256×256 with border mode to preserve aspect ratio
High-throughput configuration: 32 processes / 256 threads

Usage:

cd img2dataset
python dataset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diffusion Models Experiments

Implementations

DDPM (Denoising Diffusion Probabilistic Model)

LDM (Latent Diffusion Model)

Generated Samples

ViT (Vision Transformer)

ControlNet

Generated Samples

LoRA (Low-Rank Adaptation)

img2dataset (Image Collection)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
ControlNet		ControlNet
DDPM		DDPM
LDM		LDM
LoRA		LoRA
ViT		ViT
assets		assets
img2dataset		img2dataset
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Diffusion Models Experiments

Implementations

DDPM (Denoising Diffusion Probabilistic Model)

LDM (Latent Diffusion Model)

Generated Samples

ViT (Vision Transformer)

ControlNet

Generated Samples

LoRA (Low-Rank Adaptation)

img2dataset (Image Collection)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages