VideoEraser is a training-free framework that prevents Text-to-Video (T2V) diffusion models from generating videos with undesirable concepts, even when explicitly prompted. It achieves state-of-the-art performance in suppressing undesirable content during T2V generation, reducing it by 46% on average across four tasks compared to baselines.
Besides, VideoEraser is applicable to multiple T2V diffusion models, including UNet-based models β AnimateDiff, LaVie, ZeroScope, ModelScope, and a DiT-based model β CogVideoX.
| Object Erasure | Celebrity Erasure | Artistic Style Erasure | Explicit Content Erasure | |
|---|---|---|---|---|
| Task | Object Erasure | Celebrity Erasure | Artistic Style Erasure | Explicit Content Erasure |
| Original Prompt | A rally car racing through a snowy forest path. | Jackie Chan is tai chi. | A man running under starry night by Van Gogh. | A naked man is playing basketball. |
| Original Video | object.mp4 |
celebrity.mp4 |
artist.mp4 |
explicit.mp4 |
| Erased Concept | rally car | Jackie Chan | Van Gogh | naked |
| Erased Video | object_removal.mp4 |
celebrity_removal.mp4 |
artist_removal.mp4 |
explicit_removal.mp4 |
- [2025.11] π Our paper "VideoEraser: Concept Erasure in Text-to-Video Diffusion Models" has been accepted to EMNLP 2025 Main Conference!
git clone https://github.com/bluedream02/VideoEraser.git
cd VideoEraser/AnimateDiff
# Create environment
conda create -n animatediff python=3.10
conda activate animatediff
pip install -r requirements.txtDownload Pre-trained Models:
python scripts/animate.py --pretrained-model-path stable-diffusion-v1-5/stable-diffusion-v1-5
cd Motion_Module
wget https://huggingface.co/guoyww/animatediff/resolve/main/mm_sd_v15.ckpt
cd ../..Expected structure:
AnimateDiff/
βββ models/
β βββ stable-diffusion-v1-5/ # Stable Diffusion base model
β β βββ ...
β βββ Motion_Module/
β βββ mm_sd_v15.ckpt # AnimateDiff motion module
cd VideoEraser/ModelScope
conda create -n modelscope python=3.10
conda activate modelscope
pip install -r requirements.txtDownload Pre-trained Models:
mkdir -p models
cd models
git lfs install
git clone https://huggingface.co/cerspense/zeroscope_v2_576w
git clone https://huggingface.co/damo-vilab/text-to-video-ms-1.7b
cd ..Expected structure:
ModelScope/
βββ models/
β βββ zeroscope_v2_576w/ # ZeroScope model
β β βββ ...
β βββ text-to-video-ms-1.7b/ # ModelScope model (alternative)
β βββ ...
cd VideoEraser/Lavie
conda env create -f environment.yml
conda activate lavieDownload Pre-trained Models:
Download pre-trained LaVie models, Stable Diffusion 1.4, and stable-diffusion-x4-upscaler:
mkdir -p pretrained_models
cd pretrained_models
wget https://huggingface.co/Vchitect/LaVie/resolve/main/lavie_base.pt
wget https://huggingface.co/Vchitect/LaVie/resolve/main/lavie_interpolation.pt
wget https://huggingface.co/Vchitect/LaVie/resolve/main/lavie_vsr.pt
git lfs install
git clone https://huggingface.co/CompVis/stable-diffusion-v1-4
git clone https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler
cd ..Expected structure:
Lavie/
βββ pretrained_models/
β βββ lavie_base.pt # Base T2V model
β βββ lavie_interpolation.pt # Frame interpolation model
β βββ lavie_vsr.pt # Video super-resolution model
β βββ stable-diffusion-v1-4/ # SD 1.4 base model
β β βββ ...
β βββ stable-diffusion-x4-upscaler/ # SD x4 upscaler
β βββ ...
cd VideoEraser/CogVideoX
conda create -n cogvideox python=3.10
conda activate cogvideox
pip install -r requirements.txtDownload Pre-trained Models:
mkdir -p models
cd models
git clone https://huggingface.co/THUDM/CogVideoX-5b
cd ..Expected structure:
CogVideoX/
βββ models/
β βββ CogVideoX-5b/
cd AnimateDiff
python scripts/animate.py \
--pretrained-model-path stable-diffusion-v1-5/stable-diffusion-v1-5 \
--prompt "A man running under starry night by Van Gogh." \
--erased-concept "Van Gogh" \
--output-dir ./outputs \
--seed 42See AnimateDiff/README.md for detailed usage.
cd ModelScope
# Simple usage with HuggingFace model
python inference.py \
--model cerspense/zeroscope_v2_576w \
--prompt "A man running under starry night by Van Gogh." \
--erased-concept "Van Gogh" \
--output ./outputs \
--seed 42
# Or with ModelScope backbone
python inference.py \
--model damo-vilab/text-to-video-ms-1.7b \
--prompt "A man running under starry night by Van Gogh." \
--erased-concept "Van Gogh" \
--output ./outputsSee ModelScope/README.md for detailed usage.
cd Lavie/base
python pipelines/sample.py \
--config configs/example.yaml \
--text-prompt "A man running under starry night by Van Gogh." \
--unlearn-prompt "Van Gogh" \
--output-dir ./outputs \
--seed 42Note: LaVie now supports command-line arguments that override config file settings.
See Lavie/README.md for detailed usage.
cd CogVideoX
python cli_demo.py \
--prompt "A man running under starry night by Van Gogh." \
--unsafe_concept "Van Gogh" \
--model_path THUDM/CogVideoX-2b \
--output_path ./output.mp4See CogVideoX/README.md for detailed usage.
We provide evaluation scripts for assessing concept erasure performance. The scripts process videos frame-by-frame: if any frame contains the target concept, the video is considered to contain that concept.
cd evaluation
# 1. Artistic Style Detection (requires OpenAI API)
export OPENAI_API_KEY="your_api_key"
export OPENAI_BASE_URL="https://api.openai.com/v1" # Optional, defaults to OpenAI
python artist.py \
--input-folder /path/to/videos \
--output-folder ./results \
--num-samples 5
# 2. Object Detection
python object.py \
--input-folder /path/to/videos \
--output-folder ./results \
--target-objects cassette player \
--num-samples 5
# 3. Explicit Content Detection (requires nudenet)
pip install nudenet
python explict.py \
--input-folder /path/to/videos \
--output-folder ./results \
--num-samples 5This work builds upon several excellent open-source projects:
- AnimateDiff - Motion module for Stable Diffusion
- Text-To-Video-Finetuning - ZeroScope and ModelScope training framework
- LaVie - Video generation with cascaded diffusion models
- CogVideoX - Large-scale text-to-video generation model
- Stable Diffusion - Foundation text-to-image model
- SEGA - Instructing Text-to-Image Models using Semantic Guidance
- SAFREE - Safe and free text-to-image generation
We thank the authors for their valuable contributions to the community.
If you find VideoEraser useful in your research, please cite:
@inproceedings{xu2025videoeraser,
title={VideoEraser: Concept Erasure in Text-to-Video Diffusion Models},
author={Xu, Naen and Zhang, Jinghuai and Li, Changjiang and Chen, Zhi and Zhou, Chunyi and Li, Qingming and Du, Tianyu and Ji, Shouling},
booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
pages={5965--5994},
year={2025}
}