Skip to content

dongdongunique/LeakyCLIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LeakyCLIP: Extracting Training Data from CLIP

Paper Python PyTorch

A Model Inversion Attack Framework for High-Fidelity Training Data Extraction from CLIP Models

This repository contains the official implementation of LeakyCLIP, a novel model inversion attack framework designed to reconstruct training images from CLIP text embeddings. Our method achieves 258% improvement in SSIM compared to baseline approaches on LAION-2B subset.


πŸ“„ Paper

LeakyCLIP: Extracting Training Data from CLIP Yunhao Chen, Shujie Wang, Xin Wang, Xingjun Ma Fudan University arXiv:2508.00756v3 [cs.CR]


🎯 Overview

LeakyCLIP addresses three fundamental challenges in CLIP inversion:

  1. Non-Robust Features: CLIP learns features that are highly predictive but may not correspond to meaningful visual concepts, leading to unstable optimization landscapes.

    • Solution: Adversarial Fine-Tuning (AFT) using FARE [39] to smooth gradients
  2. Limited Visual Semantics: Text embeddings capture abstract concepts but lack high-level visual information (object layout, scale).

    • Solution: Linear Transformation-Based Embedding Alignment (EA) to project text embeddings into pseudo-image embeddings
  3. Lack of Low-Level Features: Pseudo-image embeddings lack fine-grained details for realistic reconstruction.

    • Solution: Controlled Stable Diffusion-Based Refinement (DR) to add textures and sharp edges

LeakyCLIP Pipeline

Figure 1: The LeakyCLIP three-stage pipeline: (1) Adversarial Fine-Tuning (AFT) smooths the optimization landscape, (2) Embedding Alignment (EA) projects text embeddings into pseudo-image embeddings via learned linear transformation M, (3) Diffusion Refinement (DR) adds low-level details using Stable Diffusion.

πŸ†• VAE Latent Space Inversion (Improved): This implementation includes an enhanced inversion mode that optimizes directly in the VAE latent space (instead of pixel space), resulting in better fidelity, faster convergence, and more realistic reconstructions. Enable with --inversion-space vae.


πŸ”‘ Key Features

  • Three-Stage Pipeline: Adversarial Fine-Tuning β†’ Embedding Alignment β†’ Diffusion Refinement
  • Multi-Architecture Support: ViT-B/16, ViT-B/32, ViT-L/14, ConvNeXt-Base
  • VAE Latent Space Inversion: Optimize in SD-VAE latent space for improved efficiency and fidelity
  • Comprehensive Metrics: SSIM, LPIPS, CLIP Score, SSCD
  • Membership Inference: Detect training data membership from reconstruction metrics
  • Privacy Risk Assessment: Extract sensitive PII including facial images

πŸš€ Quick Start

Installation

https://github.com/dongdongunique/LeakyCLIP.git
cd LeakyCLIP
pip install -r requirements.txt

Model and Dataset Preparation

1. Download CLIP Models

LeakyCLIP requires pre-trained CLIP models. Set up the model directory:

mkdir -p ./models

⚠️ Important: Before running experiments, modify paths.py to match your system paths:

# In leakyclip_release/paths.py
DEFAULT_MODEL_ROOT = os.environ.get(
    "LEAKYCLIP_MODEL_ROOT",
    "/your/path/to/models",  # <-- Update this
)
DEFAULT_DATA_ROOT = os.environ.get(
    "LEAKYCLIP_DATA_ROOT",
    "/your/path/to/data",  # <-- Update this
)

Required Models:

Model Architecture HuggingFace ID Purpose
ViT-B/16 ViT-B-16 laion/CLIP-ViT-B-16-laion2B-s34B-b88K Main inversion model
ViT-B/32 ViT-B-32 laion/CLIP-ViT-B-32-laion2B-s34B-b79K Alternative architecture
ViT-L/14 ViT-L-14 laion/CLIP-ViT-L-14-laion2B-s32B-b82K Large model variant
ConvNeXt-Base ConvNeXt laion/CLIP-convnext_base_w_320-laion_aesthetic-s13B-b82K CLIP Score metric
Robust ViT-B/16 ViT-B-16 chs20/FARE4-ViT-B-16-laion2B-s34B-b88K FARE adversarial fine-tuned
Robust ViT-B/32 ViT-B-32 chs20/FARE4-ViT-B-32-laion2B-s34B-b79K FARE adversarial fine-tuned
Robust ViT-L/14 ViT-L-14 Erdos2568/Robust_CLIP (eps8/20000.pt) Robust CLIP (eps=8)

Download via HuggingFace:

# ViT-B/16 (standard)
python -c "import open_clip; open_clip.create_model_and_transforms('ViT-B-16', pretrained='laion2b_s34b_b88k')"

# Robust ViT-B/16 (FARE fine-tuned)
python -c "import open_clip; open_clip.create_model_and_transforms('ViT-B-16', pretrained='chs20/FARE4-ViT-B-16-laion2B-s34B-b88K')"

Or download manually:

# Using huggingface-cli
huggingface-cli download laion/CLIP-ViT-B-16-laion2B-s34B-b88K --local-dir ./models/vit-b-16
huggingface-cli download chs20/FARE4-ViT-B-16-laion2B-s34B-b88K --local-dir ./models/vit-b-16-robust
huggingface-cli download laion/CLIP-convnext_base_w_320-laion_aesthetic-s13B-b82K --local-dir ./models/convnext-base

Additional Robust Models:

Model HuggingFace URL Description
Robust ViT-B/32 chs20/FARE4-ViT-B-32-laion2B-s34B-b79K FARE adversarial fine-tuned ViT-B/32
Robust ViT-L/14 Erdos2568/Robust_CLIP Robust CLIP ViT-L/14 (eps=8)

Download Robust Models:

# Robust ViT-B/32 (FARE fine-tuned)
huggingface-cli download chs20/FARE4-ViT-B-32-laion2B-s34B-b79K --local-dir ./models/vit-b-32-robust

# Robust ViT-L/14 (eps=8)
wget https://huggingface.co/Erdos2568/Robust_CLIP/resolve/main/eps8/20000.pt \
  -O ./models/vit-l-14-robust.pt

2. Download Stable Diffusion Models (for Refinement)

huggingface-cli download stabilityai/stable-diffusion-xl-base-1.0 --local-dir ./models/sdxl-base-1.0

3. Download VAE Model (for VAE Latent Inversion)

huggingface-cli download madebyollin/sdxl-vae-fp16-fix --local-dir ./models/sdxl-vae

4. Download Evaluation Metrics

SSCD (Self-Supervised Copy Detection):

# Download SSCD weights
wget https://dl.fbaipublicfiles.com/sscd-copy-detection/sscd_disc_mixup.torchvision.pt \
  -O ./models/sscd_disc_mixup.torchvision.pt

5. Prepare Datasets

LAION-HD Subset (Recommended):

# Download curated high-quality subset (~1M samples)
python -c "
from datasets import load_dataset
dataset = load_dataset('yuvalkirstain/laion-hd-subset', split='train')
# Save to disk...
dataset.save_to_disk('./HF_dataset/laion')
"

Furniture Object Dataset:

# Download furniture object dataset (~10K samples)
python -c "
from datasets import load_dataset
dataset = load_dataset('abrarlohia/sample_furniture_object', split='train')
# Save to disk...
dataset.save_to_disk('./HF_dataset/furniture_object')
"

Supported Datasets:

  • laion - LAION-2B subset (main evaluation)
  • flickr - Flickr30k (caption diversity)
  • furniture_object - Furniture objects (structured content)
  • lfw - Labeled Faces in the Wild (privacy evaluation)

HuggingFace Dataset Sources:

Dataset HuggingFace ID Description
LAION-HD Subset yuvalkirstain/laion-hd-subset Curated high-quality subset (recommended)
Furniture Objects abrarlohia/sample_furniture_object Structured furniture images

Note: We recommend using yuvalkirstain/laion-hd-subset instead of the full LAION-2B-en dataset (~2.3 billion images) for faster experimentation.

6. Train Embedding Alignment (Optional)

If you want to train your own embedding alignment matrix:

python -m leakyclip_release.ea_train \
  --model-name ViT-B-16_robust_fair_4 \
  --dataset laion \
  --num-samples 2000 \
  --batch-size 256 \
  --output-dir ./text2image_embedding

This learns the linear transformation matrix M that maps text embeddings to pseudo-image embeddings.

Single Image Inversion

python -m leakyclip_release.main \
  --config ./leakyclip_release/configs/method/single.json \
  --text "a mid-century modern beige armchair with wooden tapered legs" \
  --output ./out/inversion.png \
  --compute-metrics

Example Output:

Example Inversion Output

Figure: Example reconstruction from text prompt using LeakyCLIP with VAE latent space inversion and Stable Diffusion refinement.

Dataset Evaluation

python -m leakyclip_release.main \
  --config ./leakyclip_release/configs/method/laion.json \
  --dataset-name laion \
  --output-dir ./out/laion_results \
  --max-samples 1000 \
  --compute-metrics

Python SDK Usage

LeakyCLIP provides a Python API for programmatic access:

from leakyclip_release.models import create_model
from leakyclip_release.inversion import CLIPInverter, InversionConfig
from leakyclip_release.refinement import SDRefiner, SDRefineConfig
import torch

# Setup device
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load CLIP model
model, tokenizer, _, _ = create_model("ViT-B-16", device=torch.device(device))

# Create inverter with custom config
inverter = CLIPInverter(
    model,
    tokenizer,
    device=device,
    config=InversionConfig(
        num_steps=500,
        lr=0.03,
        num_views=32,
        image_size=1024,
        inversion_space="vae",  # VAE latent space for better fidelity
        vae_model_id="./models/sdxl-vae",
    ),
)

# Optional: Add Stable Diffusion refinement
refiner = SDRefiner(
    SDRefineConfig(
        model_id="./models/sdxl-base-1.0",
        strength=0.3,
        num_inference_steps=50,
    )
)

# Perform inversion from text
result = inverter.invert_from_text(
    PROMPT = "A sleek contemporary living room with a gray sectional sofa, glass coffee table, floor lamp with warm lighting, hardwood floor, large window with curtains, modern interior design, architectural digest style, high resolution",
    return_metrics=False,
)

# Apply refinement (optional)
refined_result = refiner.refine(result, PROMPT = "A sleek contemporary living room with a gray sectional sofa, glass coffee table, floor lamp with warm lighting, hardwood floor, large window with curtains, modern interior design, architectural digest style, high resolution")

# Save result
refined_result.save("./out/inversion_sdk.png")

Advanced: Inversion with Reference Image and Metrics

from PIL import Image
from leakyclip_release.eval import LPIPSMetric, SSIMMetric

# Load reference image
reference = Image.open("reference.png").convert("RGB")

# Build metrics
metrics = [
    LPIPSMetric(device=device),
    SSIMMetric(device=device),
]

# Invert with metrics evaluation
result, metric_values = inverter.invert_from_text(
    "a mid-century modern beige armchair",
    reference_image=reference,
    metrics=metrics,
    return_metrics=True,
)

print(f"LPIPS: {metric_values.get('lpips')}")
print(f"SSIM: {metric_values.get('ssim')}")

Advanced: VAE Latent Space Inversion

from leakyclip_release.inversion import InversionConfig

# Configure for VAE latent space
vae_config = InversionConfig(
    image_size=1024,
    inversion_space="vae",           # Use VAE latent space
    vae_model_id="./models/sdxl-vae",
    vae_dtype="fp16",
    vae_scaling_factor=0.13025,
    latent_l2_weight=0.001,          # L2 regularization in latent space
)

inverter = CLIPInverter(model, tokenizer, device=device, config=vae_config)
result = inverter.invert_from_text(PROMPT = "A sleek contemporary living room with a gray sectional sofa, glass coffee table, floor lamp with warm lighting, hardwood floor, large window with curtains, modern interior design, architectural digest style, high resolution")

Advanced: Robust Model Usage (FARE)

# Load adversarially fine-tuned (robust) model
robust_model, _, _, _ = create_model("ViT-B-16_robust_fair_4", device=torch.device(device))

# Use with embedding alignment
inverter = CLIPInverter(
    robust_model,
    tokenizer,
    device=device,
    config=InversionConfig(
        align_model_name="ViT-B-16_robust_fair_4",
        transpose_model_name="pinverse_model",
    ),
)

πŸ—οΈ Architecture

leakyclip_release/
β”œβ”€β”€ main.py                    # Main entry point for inversion
β”œβ”€β”€ ea_train.py               # Embedding alignment training
β”œβ”€β”€ config.py                 # Configuration management
β”œβ”€β”€ data.py                   # Dataset loaders
β”œβ”€β”€ models/
β”‚   └── model_factory.py      # CLIP model factory (ViT, ConvNeXt)
β”œβ”€β”€ inversion/
β”‚   β”œβ”€β”€ inverter.py           # CLIP inversion logic
β”‚   └── augmentations.py      # Data augmentation pipeline
β”œβ”€β”€ eval/
β”‚   β”œβ”€β”€ base.py               # Metric base classes
β”‚   └── metrics.py            # SSIM, LPIPS, CS, SSCD metrics
β”œβ”€β”€ refinement/
β”‚   └── sd_refiner.py         # Stable Diffusion refinement
└── configs/                  # Configuration files
    β”œβ”€β”€ inversion/default.json
    β”œβ”€β”€ method/single.json
    β”œβ”€β”€ method/laion.json
    └── method/flickr.json

βš™οΈ Configuration

Key Hyperparameters

Parameter Default Description
num_steps 500 Inversion optimization steps
lr 0.03 Learning rate for inversion
num_views 32 Number of augmented views
image_size 1024 Output image size
tv_weight 0.05 Total variation regularization
patch_loss_weight 5.0 Patch-based consistency loss
thresholding 0.95 Dynamic threshold quantile
refine_strength 0.3-0.55 SD img2img strength

Environment Variables

Variable Description Default
LEAKYCLIP_MODEL_ROOT Model checkpoint directory ./models
LEAKYCLIP_DATA_ROOT Dataset cache directory ./HF_dataset
LEAKYCLIP_ALIGN_ROOT Embedding alignment weights ./text2image_embedding
LEAKYCLIP_SSCD_WEIGHTS SSCD model weights ./models/sscd_*.pt

πŸ§ͺ Training Embedding Alignment

Learn the linear transformation matrix M for embedding alignment:

python -m leakyclip_release.ea_train \
  --model-name ViT-B-16_robust_fair_4 \
  --dataset laion \
  --batch-size 256 \
  --output-dir ./text2image_embedding

Supported datasets: laion, flickr, furniture_object


πŸ“ˆ Evaluation Metrics

We adopt five complementary metrics for comprehensive evaluation:

  • SSIM [48]: Structural similarity [-1, 1], higher is better
  • LPIPS [57]: Perceptual similarity [0, ∞), lower is better
  • CLIP Score (CS): Cosine similarity using ConvNeXt-Base [-1, 1]
  • SSCD [36]: Self-supervised copy detection [-1, 1]

πŸ”’ Threat Model

Attacker Capabilities:

  • White-box access to CLIP parameters
  • Access to exact training captions paired with target images
  • Standard assumptions for rigorous privacy vulnerability assessment

Attack Goal: Reconstruct training images from text prompts via model inversion.


πŸ“ Citation

If you use this code in your research, please cite:

@article{chen2025leakyclip,
  title={LeakyCLIP: Extracting Training Data from CLIP},
  author={Chen, Yunhao and Wang, Shujie and Wang, Xin and Ma, Xingjun},
  journal={arXiv preprint arXiv:2508.00756},
  year={2025}
}

πŸ“š References

Key papers and methods used in this work:

  • [37] Radford et al. "Learning Transferable Visual Models From Natural Language Supervision" (CLIP)
  • [39] Wang et al. "Fine-tuning for Adversarially Robust Embeddings" (FARE)
  • [38] Rombach et al. "High-Resolution Image Synthesis with Latent Diffusion Models" (Stable Diffusion)
  • [48] Wang et al. "Image Quality Assessment: From Error Visibility to Structural Similarity" (SSIM)
  • [57] Zhang et al. "The Unreasonable Effectiveness of Deep Features as a Perceptual Metric" (LPIPS)
  • [36] Pizzi et al. "A Self-Supervised Descriptor for Image Copy Detection" (SSCD)

⚠️ Disclaimer

This code is provided for research purposes only to understand and mitigate privacy risks in multimodal models. The authors are not responsible for any misuse of this code for unauthorized data extraction or privacy violations.


πŸ“§ Contact

For questions or issues, please contact:


License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors