A comparative study of six personalization methods for text-to-image diffusion models, conducted as an independent study at Indiana University under the guidance of Professor Mohammad Al Hasan.
View the Interactive Report — Detailed results, architecture diagrams, generated samples, and method comparisons.
All experiments use Stable Diffusion v1.4/v1.5 on NVIDIA H100 80 GB GPUs (IU Quartz HPC) with HuggingFace Diffusers and TRL.
This project systematically evaluates how different personalization strategies affect image quality, subject fidelity, and inference speed. Each method occupies a different point on the fidelity-efficiency spectrum:
- DreamBooth — Full U-Net fine-tuning for maximum subject fidelity (~3.4 GB)
- LoRA — Low-rank adapters for efficient style transfer (~3 MB)
- Textual Inversion — Embedding-only optimization, lightest approach (~3-24 KB)
- Custom Diffusion — Selective K,V cross-attention fine-tuning with multi-concept support (~75 MB)
- LCM Distillation — Consistency distillation for 27x faster inference (108 ms vs 2,913 ms)
- DDPO — Reinforcement learning with aesthetic reward optimization (score: 6.28)
| Method | What's Trained | Trainable Params | Storage |
|---|---|---|---|
| DreamBooth | Entire U-Net | ~860M | ~3.4 GB |
| LoRA | Low-rank adapters | ~1.6-6.4M | ~3 MB |
| Textual Inversion | Embedding only | ~768-6,144 | ~3-24 KB |
| Custom Diffusion | Cross-attn K,V + token | ~57M | ~75 MB |
| LCM Distillation | Student model | Full model | ~3.4 GB |
| DDPO | U-Net (RL) | Full model | ~3.4 GB |
Diffusion-Personalization/
├── jobs/ # SLURM job submission scripts
│ ├── submit_train.sh
│ └── submit_train2.sh
├── models/
│ └── experiment_tracker/ # Experiment logs (per-method CSVs)
│ ├── DB.csv # DreamBooth experiments
│ ├── LORA.csv # LoRA experiments
│ ├── TI.csv # Textual Inversion experiments
│ ├── CD.csv # Custom Diffusion experiments
│ ├── LCD.csv # LCM Distillation experiments
│ └── RL_DDPO.csv # DDPO experiments
├── inputs/
│ └── prompts/ # Evaluation prompt files
│ ├── prompts_dog.txt # DreamBooth prompts
│ ├── prompts_dog_CD.txt # Custom Diffusion prompts
│ ├── prompts_cat.txt # Textual Inversion prompts
│ ├── prompts_naruto.txt # LoRA style transfer prompts
│ ├── prompts_lcm.txt # LCM general prompts
│ ├── prompts_ddpo.txt # DDPO aesthetic prompts
│ └── prompts_offsubject.txt # Off-subject prompts
├── scripts/
│ ├── train/ # Training scripts (per method)
│ ├── infer/ # Inference scripts (per method)
│ ├── experiment.py # Experiment tracker
│ ├── experiment_after_evaluation.py
│ ├── experiment_after_training.sh
│ ├── eval_infer.py # General evaluation inference
│ ├── eval_infer_ti.py # TI-specific evaluation
│ ├── evaluation.py # Metric computation (CLIP-T, CLIP-I, DINO-I)
│ ├── evaluation_ddpo.py # DDPO evaluation (aesthetic + CLIP-T)
│ └── evaluation_lcm.py # LCM evaluation (CLIP-T + latency)
├── .gitignore
└── README.md
- Python 3.10+
- CUDA 12.6+
- PyTorch 2.11+
git clone https://github.com/meghanaNanuvala/Diffusion-Personalization.git
cd Diffusion-PersonalizationThe Diffusers library is required for all training and inference scripts. It is not included in this repo due to size constraints.
git clone https://github.com/huggingface/diffusers.git
cd diffusers
pip install -e .
cd ..The TRL library is required for DDPO reinforcement learning training.
git clone https://github.com/huggingface/trl.git
cd trl
pip install -e .
cd ..pip install transformers accelerate safetensors
pip install open-clip-torch # For CLIP-T and CLIP-I metrics
pip install torchvision # For DINO-I metricsTraining scripts are in scripts/train/. Each method has its own training script. Example for DreamBooth:
# On SLURM (IU Quartz HPC)
sbatch jobs/submit_train.sh
# Or run directly
python scripts/train/train_dreambooth.py# Generate images from a trained model
python scripts/eval_infer.py
# Compute metrics (CLIP-T, CLIP-I, DINO-I)
python scripts/evaluation.py
# LCM-specific evaluation (CLIP-T + latency)
python scripts/evaluation_lcm.py
# DDPO-specific evaluation (aesthetic score + CLIP-T)
python scripts/evaluation_ddpo.pyAll experiments are automatically logged to CSV files in models/experiment_tracker/. Each row records hyperparameters, hardware context, and evaluation metrics.
| Method | Best CLIP-T | Best CLIP-I | Best DINO-I | Latency |
|---|---|---|---|---|
| DreamBooth | 0.274 | 0.845 | 0.588 | 2,913 ms |
| LoRA | 0.249 | - | - | 835 ms |
| Textual Inversion | 0.277 | 0.857 | 0.690 | 613 ms |
| Custom Diffusion | 0.258 | 0.741 | 0.081 | 644 ms |
| LCM | 0.252 | - | - | 108 ms |
| DDPO | 0.232 | - | - | 1,187 ms |
DDPO achieves an aesthetic score of 6.28 (LoRA variant). LCM achieves a 27x inference speedup (108 ms vs 2,913 ms for DreamBooth).
For detailed per-prompt breakdowns, generated samples, and architecture diagrams, see the interactive report.
- GPU: NVIDIA H100 80 GB HBM3
- Cluster: IU Quartz HPC
- CUDA: 12.6
- PyTorch: 2.11
Meghana Nanuvala - Indiana University
Independent study under Professor Mohammad Al Hasan, Luddy School of Informatics, Computing, and Engineering, Indiana University Indianapolis.