VideoMaMa - ComfyUI Custom Nodes

ComfyUI custom node implementation of VideoMaMa for video matting with mask conditioning.

Original Research: VideoMaMa: Mask-Guided Video Matting via Generative Prior Original Repository: cvlab-kaist/VideoMaMa

This is a ComfyUI custom node implementation. All credit goes to the original authors for their excellent research and open-source contribution.

Installation

1. Clone Repository

cd /path/to/ComfyUI/custom_nodes/
git clone https://github.com/okdalto/ComfyUI-VideoMaMa
cd ComfyUI-VideoMaMa
pip install -r requirements.txt

2. Download Models

Base Model (Auto-download)

The Stable Video Diffusion base model will be automatically downloaded on first use if not present.

To download manually:

huggingface-cli download stabilityai/stable-video-diffusion-img2vid-xt \
    --local-dir checkpoints/stabilityai/stable-video-diffusion-img2vid-xt

VideoMaMa Checkpoint (Auto-download)

The VideoMaMa UNet checkpoint will be automatically downloaded on first use if not present.

To download manually:

huggingface-cli download SammyLim/VideoMaMa \
    --local-dir checkpoints/VideoMaMa

SAM2 (Optional - for mask generation)

# Install SAM2
git clone https://github.com/facebookresearch/sam2
cd sam2 && pip install -e .

# Download checkpoint
mkdir -p ../checkpoints/sam2
cd ../checkpoints/sam2
wget https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.pt

# Download config
mkdir -p ../../configs/sam2.1
cd ../../configs/sam2.1
wget https://raw.githubusercontent.com/facebookresearch/sam2/main/sam2/configs/sam2.1/sam2.1_hiera_l.yaml

3. Restart ComfyUI

The nodes will appear under the VideoMaMa category.

Available Nodes

1. VideoMaMa Pipeline Loader

Loads the inference pipeline with base SVD model and fine-tuned UNet.

Inputs:

base_model_path: Path to base SVD model (default: checkpoints/stabilityai/stable-video-diffusion-img2vid-xt)
unet_checkpoint_path: Path to fine-tuned UNet (default: checkpoints/VideoMaMa)
precision: fp16 or bf16 (default: fp16)

Outputs:

VIDEOMAMA_PIPELINE: Pipeline object

2. VideoMaMa Run

Runs video matting inference with mask conditioning.

Inputs:

pipeline: Pipeline from loader
images: Input video frames [N, H, W, C]
masks: Mask frames [N, H, W, C]
seed: Random seed (default: 42)
max_resolution: Longest axis resolution for processing (default: 1024, range: 256-2048). Aspect ratio is preserved and dimensions are aligned to multiples of 8.
fps: Frames per second (default: 7)
motion_bucket_id: Motion intensity (default: 127)
noise_aug_strength: Noise augmentation (default: 0.0)

Outputs:

MASK: Generated mask frames [N, H, W] (at original input resolution)

3. SAM2 Video Mask Generator

Generates masks using SAM2 video tracking (requires SAM2 installation).

Inputs:

images: Input video frames
checkpoint_path: SAM2 checkpoint path
config_file: SAM2 config path
user_input: Point coordinates from SAM2 Point Selector UI

Outputs:

IMAGE: Generated mask frames

SAM2 Point Selector UI

Click the user_input field to open the interactive point selector:

Controls:

Left click: Add positive point (+) - marks foreground/object to segment
Right click: Add negative point (-) - marks background to exclude
Middle click / Ctrl+click: Remove existing point
+ / - keys: Switch between positive and negative mode

Usage Tips:

Place positive points (green) on the object you want to extract
Place negative points (red) on background areas to exclude
More points = more accurate segmentation
Click Save to confirm, Cancel to discard, Clear All to reset

Example Workflow

Example workflow files are available in the examples/ folder. Import these directly into ComfyUI to get started quickly.

Basic Steps:

Load video → Use VHS Video Loader or similar
Generate masks → Use SAM2 node or load existing masks
Load pipeline → VideoMaMa Pipeline Loader
Run inference → VideoMaMa Run (connect pipeline, images, masks)
Save output → VHS Video Combine or Preview Image

Tips

Resolution: max_resolution controls the longest axis. Aspect ratio is preserved and output is resized back to the original input resolution. For example, a 1920x1080 input with max_resolution=1024 is processed at 1024x576.
Motion Bucket: Lower (50-100) = subtle, Higher (150-200) = dynamic
VRAM: Higher max_resolution requires more VRAM

Troubleshooting

"SAM2 is not available"

git clone https://github.com/facebookresearch/sam2
cd sam2 && pip install -e .

"Failed to load pipeline"

Check model paths are correct
Ensure all model files downloaded
Check VRAM availability

"Frame count mismatch"

Ensure image and mask sequences have same number of frames

Requirements

Python 3.10+
PyTorch 2.0+ with CUDA
GPU with sufficient VRAM
ComfyUI

See requirements.txt for full dependencies.

Directory Structure

VideoMaMa/
├── __init__.py
├── nodes.py
├── pipeline_svd_mask.py
├── requirements.txt
├── checkpoints/
│   ├── stabilityai/stable-video-diffusion-img2vid-xt/
│   ├── VideoMaMa/unet/
│   └── sam2/ (optional)
└── configs/sam2.1/ (optional)

Acknowledgments

This ComfyUI implementation is based on the excellent work by the KAIST CVLab team:

VideoMaMa: Mask-Guided Video Matting via Generative Prior

Paper: https://arxiv.org/abs/2601.14255
Original Repository: https://github.com/cvlab-kaist/VideoMaMa
Authors: KAIST Computer Vision Lab

We are grateful to the authors for:

Their groundbreaking research in video matting
Making their code and models publicly available
Advancing the field of generative video processing

This custom node is simply a wrapper to make VideoMaMa accessible in ComfyUI. All model weights, training methods, and core algorithms are from the original research.

Citation

If you use VideoMaMa in your work, please cite the original paper:

@article{videomama2025,
  title={VideoMaMa: Mask-Guided Video Matting via Generative Prior},
  author={[Authors from KAIST CVLab]},
  journal={arXiv preprint arXiv:2601.14255},
  year={2025}
}

License

This project follows the original VideoMaMa license terms. Please refer to the original repository for licensing details.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github/workflows		.github/workflows
assets		assets
examples		examples
utils		utils
web/js		web/js
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
nodes.py		nodes.py
pipeline_svd_mask.py		pipeline_svd_mask.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VideoMaMa - ComfyUI Custom Nodes

Installation

1. Clone Repository

2. Download Models

Base Model (Auto-download)

VideoMaMa Checkpoint (Auto-download)

SAM2 (Optional - for mask generation)

3. Restart ComfyUI

Available Nodes

1. VideoMaMa Pipeline Loader

2. VideoMaMa Run

3. SAM2 Video Mask Generator

SAM2 Point Selector UI

Example Workflow

Tips

Troubleshooting

Requirements

Directory Structure

Acknowledgments

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VideoMaMa - ComfyUI Custom Nodes

Installation

1. Clone Repository

2. Download Models

Base Model (Auto-download)

VideoMaMa Checkpoint (Auto-download)

SAM2 (Optional - for mask generation)

3. Restart ComfyUI

Available Nodes

1. VideoMaMa Pipeline Loader

2. VideoMaMa Run

3. SAM2 Video Mask Generator

SAM2 Point Selector UI

Example Workflow

Tips

Troubleshooting

Requirements

Directory Structure

Acknowledgments

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages