Skip to content

[EMNLP 2025] VideoEraser: Concept Erasure in Text-to-Video Diffusion Models

License

Notifications You must be signed in to change notification settings

NESA-Lab/VideoEraser

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

VideoEraser: Concept Erasure in Text-to-Video Diffusion Models

EMNLP 2025 arXiv

πŸ“‹ Table of Contents

πŸ“Œ Overview

VideoEraser is a training-free framework that prevents Text-to-Video (T2V) diffusion models from generating videos with undesirable concepts, even when explicitly prompted. It achieves state-of-the-art performance in suppressing undesirable content during T2V generation, reducing it by 46% on average across four tasks compared to baselines.

Besides, VideoEraser is applicable to multiple T2V diffusion models, including UNet-based models – AnimateDiff, LaVie, ZeroScope, ModelScope, and a DiT-based model – CogVideoX.

Video Demonstrations

Object Erasure Celebrity Erasure Artistic Style Erasure Explicit Content Erasure
Task Object Erasure Celebrity Erasure Artistic Style Erasure Explicit Content Erasure
Original Prompt A rally car racing through a snowy forest path. Jackie Chan is tai chi. A man running under starry night by Van Gogh. A naked man is playing basketball.
Original Video
object.mp4
celebrity.mp4
artist.mp4
explicit.mp4
Erased Concept rally car Jackie Chan Van Gogh naked
Erased Video
object_removal.mp4
celebrity_removal.mp4
artist_removal.mp4
explicit_removal.mp4

πŸ“’ News

  • [2025.11] πŸŽ‰ Our paper "VideoEraser: Concept Erasure in Text-to-Video Diffusion Models" has been accepted to EMNLP 2025 Main Conference!

πŸ”§ Installation

Setup

Option 1: AnimateDiff

git clone https://github.com/bluedream02/VideoEraser.git
cd VideoEraser/AnimateDiff

# Create environment
conda create -n animatediff python=3.10
conda activate animatediff
pip install -r requirements.txt

Download Pre-trained Models:

python scripts/animate.py --pretrained-model-path stable-diffusion-v1-5/stable-diffusion-v1-5
cd Motion_Module
wget https://huggingface.co/guoyww/animatediff/resolve/main/mm_sd_v15.ckpt
cd ../..

Expected structure:

AnimateDiff/
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ stable-diffusion-v1-5/        # Stable Diffusion base model
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   └── Motion_Module/
β”‚       └── mm_sd_v15.ckpt            # AnimateDiff motion module

Option 2: ModelScope (ZeroScope/ModelScope)

cd VideoEraser/ModelScope
conda create -n modelscope python=3.10
conda activate modelscope
pip install -r requirements.txt

Download Pre-trained Models:

mkdir -p models
cd models
git lfs install
git clone https://huggingface.co/cerspense/zeroscope_v2_576w
git clone https://huggingface.co/damo-vilab/text-to-video-ms-1.7b
cd ..

Expected structure:

ModelScope/
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ zeroscope_v2_576w/            # ZeroScope model
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   └── text-to-video-ms-1.7b/        # ModelScope model (alternative)
β”‚       β”œβ”€β”€ ...

Option 3: LaVie

cd VideoEraser/Lavie
conda env create -f environment.yml
conda activate lavie

Download Pre-trained Models:

Download pre-trained LaVie models, Stable Diffusion 1.4, and stable-diffusion-x4-upscaler:

mkdir -p pretrained_models
cd pretrained_models
wget https://huggingface.co/Vchitect/LaVie/resolve/main/lavie_base.pt
wget https://huggingface.co/Vchitect/LaVie/resolve/main/lavie_interpolation.pt
wget https://huggingface.co/Vchitect/LaVie/resolve/main/lavie_vsr.pt
git lfs install
git clone https://huggingface.co/CompVis/stable-diffusion-v1-4
git clone https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler
cd ..

Expected structure:

Lavie/
β”œβ”€β”€ pretrained_models/
β”‚   β”œβ”€β”€ lavie_base.pt                 # Base T2V model
β”‚   β”œβ”€β”€ lavie_interpolation.pt        # Frame interpolation model
β”‚   β”œβ”€β”€ lavie_vsr.pt                  # Video super-resolution model
β”‚   β”œβ”€β”€ stable-diffusion-v1-4/        # SD 1.4 base model
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   └── stable-diffusion-x4-upscaler/ # SD x4 upscaler
β”‚       β”œβ”€β”€ ...

Option 4: CogVideoX

cd VideoEraser/CogVideoX
conda create -n cogvideox python=3.10
conda activate cogvideox
pip install -r requirements.txt

Download Pre-trained Models:

mkdir -p models
cd models
git clone https://huggingface.co/THUDM/CogVideoX-5b
cd ..

Expected structure:

CogVideoX/
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ CogVideoX-5b/

πŸš€ Quick Start

AnimateDiff (UNet-based)

cd AnimateDiff
python scripts/animate.py \
    --pretrained-model-path stable-diffusion-v1-5/stable-diffusion-v1-5 \
    --prompt "A man running under starry night by Van Gogh." \
    --erased-concept "Van Gogh" \
    --output-dir ./outputs \
    --seed 42

See AnimateDiff/README.md for detailed usage.

ModelScope (UNet-based, ZeroScope/ModelScope)

cd ModelScope
# Simple usage with HuggingFace model
python inference.py \
    --model cerspense/zeroscope_v2_576w \
    --prompt "A man running under starry night by Van Gogh." \
    --erased-concept "Van Gogh" \
    --output ./outputs \
    --seed 42

# Or with ModelScope backbone
python inference.py \
    --model damo-vilab/text-to-video-ms-1.7b \
    --prompt "A man running under starry night by Van Gogh." \
    --erased-concept "Van Gogh" \
    --output ./outputs

See ModelScope/README.md for detailed usage.

LaVie (UNet-based)

cd Lavie/base
python pipelines/sample.py \
    --config configs/example.yaml \
    --text-prompt "A man running under starry night by Van Gogh." \
    --unlearn-prompt "Van Gogh" \
    --output-dir ./outputs \
    --seed 42

Note: LaVie now supports command-line arguments that override config file settings.

See Lavie/README.md for detailed usage.

CogVideoX (DiT-based)

cd CogVideoX
python cli_demo.py \
    --prompt "A man running under starry night by Van Gogh." \
    --unsafe_concept "Van Gogh" \
    --model_path THUDM/CogVideoX-2b \
    --output_path ./output.mp4

See CogVideoX/README.md for detailed usage.

Evaluation

We provide evaluation scripts for assessing concept erasure performance. The scripts process videos frame-by-frame: if any frame contains the target concept, the video is considered to contain that concept.

cd evaluation

# 1. Artistic Style Detection (requires OpenAI API)
export OPENAI_API_KEY="your_api_key"
export OPENAI_BASE_URL="https://api.openai.com/v1"  # Optional, defaults to OpenAI
python artist.py \
    --input-folder /path/to/videos \
    --output-folder ./results \
    --num-samples 5

# 2. Object Detection
python object.py \
    --input-folder /path/to/videos \
    --output-folder ./results \
    --target-objects cassette player \
    --num-samples 5

# 3. Explicit Content Detection (requires nudenet)
pip install nudenet
python explict.py \
    --input-folder /path/to/videos \
    --output-folder ./results \
    --num-samples 5

πŸ™ Acknowledgement

This work builds upon several excellent open-source projects:

  • AnimateDiff - Motion module for Stable Diffusion
  • Text-To-Video-Finetuning - ZeroScope and ModelScope training framework
  • LaVie - Video generation with cascaded diffusion models
  • CogVideoX - Large-scale text-to-video generation model
  • Stable Diffusion - Foundation text-to-image model
  • SEGA - Instructing Text-to-Image Models using Semantic Guidance
  • SAFREE - Safe and free text-to-image generation

We thank the authors for their valuable contributions to the community.

πŸ“– Citation

If you find VideoEraser useful in your research, please cite:

@inproceedings{xu2025videoeraser,
  title={VideoEraser: Concept Erasure in Text-to-Video Diffusion Models},
  author={Xu, Naen and Zhang, Jinghuai and Li, Changjiang and Chen, Zhi and Zhou, Chunyi and Li, Qingming and Du, Tianyu and Ji, Shouling},
  booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
  pages={5965--5994},
  year={2025}
}

About

[EMNLP 2025] VideoEraser: Concept Erasure in Text-to-Video Diffusion Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%