VibeVoice4macOS — VibeVoice on Apple Silicon (macOS)

This fork adds a one-file, self-contained setup & runner for the full VibeVoice-Large VibeVoice so you can launch the Gradio demo or do simple CLI inference locally on macOS Apple Silicon—no CUDA, no sudo, no global installs.

✅ Apple Silicon (arm64) + Python ≥ 3.10

✅ Everything sandboxed under ~/vibevoice_mac

✅ Uses PyTorch MPS when available (otherwise CPU)

✅ Resumes & verifies large sharded model downloads

✅ Optional Hugging Face token auto-loaded from ~/vibevoice_mac/.env or .hf_token

What’s included in this fork

vibevoice_mac_arm64.sh — the all-in-one installer/runner for macOS:
- Creates a local venv
- Clones the upstream repo
- Downloads the chosen model with resume & shard verification
- Provides a portable ffmpeg if needed (or --allow-brew)
- Runs the official Gradio demo via a bootstrap that:
  - forces SDPA attention (avoids FlashAttention)
  - never dispatches to CUDA
  - prefers MPS on Apple GPUs

Upstream model code, demos, and assets remain under their original directories; this script only orchestrates a Mac-friendly setup.

Requirements

macOS on Apple Silicon (arm64)
Python 3.10+ available as python3
Internet for first run (pip, git, model download)
Several GB of free disk (models can be large)

Quick start

# 1) Clone your fork (this repo)
git clone https://github.com/<your-username>/vi-voice4mac.git
cd vi-voice4mac

# 2) Make the script executable
chmod +x vibevoice_mac_arm64.sh

# 3) (Recommended) Add your Hugging Face token once
mkdir -p ~/vibevoice_mac
printf 'HF_TOKEN=hf_your_token_here\n' > ~/vibevoice_mac/.env

# 4) Run the Gradio demo (default model; local UI on port 7860)
bash vibevoice_mac_arm64.sh --demo

# 5) Share the demo publicly (Gradio share URL)
bash vibevoice_mac_arm64.sh --demo --share

# 6) Try a smaller model if the Large one is heavy
bash vibevoice_mac_arm64.sh --model microsoft/VibeVoice-1.5B --demo

Hugging Face token (for gated/private models)

The script automatically picks a token from:

Existing env (HF_TOKEN)
~/vibevoice_mac/.env (e.g. HF_TOKEN=hf_xxx)
~/vibevoice_mac/.hf_token (file containing only the token string)

Examples:

# One-time setup (preferred)
mkdir -p ~/vibevoice_mac
cat > ~/vibevoice_mac/.env <<'EOF'
HF_TOKEN=hf_your_token_here
EOF

# Or pass inline per run
HF_TOKEN=hf_your_token_here bash vibevoice_mac_arm64.sh --demo

Never commit your token. It lives outside the repo by default.

Basic usage

Launch Gradio demo

# Default model: aoi-ot/VibeVoice-Large
bash vibevoice_mac_arm64.sh --demo

# Public sharing
bash vibevoice_mac_arm64.sh --demo --share

# Different model
bash vibevoice_mac_arm64.sh --model microsoft/VibeVoice-1.5B --demo

CLI-style inference

bash vibevoice_mac_arm64.sh --infer
# Output: ~/vibevoice_mac/outputs/sample_out.wav

Clean everything

bash vibevoice_mac_arm64.sh --clean --force

Useful env/flags

# Change the demo port
PORT=7861 bash vibevoice_mac_arm64.sh --demo

# Allow Homebrew ffmpeg (if missing)
bash vibevoice_mac_arm64.sh --allow-brew --demo

Models (quick guide)

Model	Context	Generation	Link
VibeVoice-1.5B	64K	~90 min	https://huggingface.co/microsoft/VibeVoice-1.5B
VibeVoice-Large	32K	~45 min	https://huggingface.co/microsoft/VibeVoice-Large

Default in this script: aoi-ot/VibeVoice-Large (change via --model ...)
Some models are gated/private → you’ll need a valid HF token

What the script does (under the hood)

Creates ~/vibevoice_mac with:
- .venv/ (local Python virtualenv)
- _cache/ (HF/Torch/Transformers caches)
- models/ (downloaded model files)
- tools/ffmpeg/ffmpeg (portable binary if needed)
- VibeVoice/ (upstream repo)
- outputs/ (audio from CLI path)
Pins all HF caches inside the project folder (no global cache usage).
Verifies shard completeness from model.safetensors.index.json and resumes if any pieces are missing.
Bootstraps the demo to force SDPA and avoid CUDA on macOS.

Troubleshooting

401 / “Repository Not Found” / gated model Add a valid HF token (see token section above) and make sure the model grants your account access.
Missing shard error (e.g., model-00002-of-00010.safetensors) Re-run the script; downloads resume and shards are re-verified. Also check you have enough free disk space.
“Torch not compiled with CUDA enabled” Expected on macOS. The script never tries CUDA and forces SDPA/MPS/CPU.
Port already in use PORT=7861 bash vibevoice_mac_arm64.sh --demo
ffmpeg not found Script provides a portable ffmpeg; or pass --allow-brew to install via Homebrew.

Notes & tips

Performance: MPS is faster than CPU, but still slower than high-end NVIDIA GPUs. For smoother UX, try --model microsoft/VibeVoice-1.5B.
Attention backend: The demo code prefers FlashAttention on CUDA; this fork forces SDPA (safe on MPS/CPU). Audio quality may differ from CUDA+FA2 runs.

Responsible use (risks & limitations)

High-quality synthetic speech can be misused (impersonation, fraud, disinformation). Use responsibly and comply with all applicable laws and policies. Disclose AI-generated content where appropriate.

Other known limitations (from upstream notes):

English/Chinese are the strongest languages; others may be unstable.
Cross-lingual transfer can be inconsistent.
Spontaneous music/background sounds may appear depending on prompts.
No explicit support for overlapping speech.

This project is intended for research & experimentation.

Acknowledgements

Upstream: WhoPaidItAll/VibeVoice
Hugging Face ecosystem (transformers, accelerate, huggingface_hub)
PyTorch MPS on Apple Silicon

License

The vibevoice_mac_arm64.sh helper script in this fork: MIT (or match upstream—add a LICENSE file accordingly).
VibeVoice code/models: their respective upstream licenses & model cards apply.

Repository layout

.
├─ vibevoice_mac_arm64.sh    # ← this script (macOS setup & runner)
├─ VibeVoice/                # upstream repo code (cloned by the script at runtime)
└─ README.md                 # this file

The runtime sandbox (~/vibevoice_mac/) is created on first run and contains your venv, caches, models, token files, etc. It’s outside the repo so you don’t accidentally commit huge files or secrets.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Figures		Figures
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
vibevoice_mac_arm64.sh		vibevoice_mac_arm64.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VibeVoice4macOS — VibeVoice on Apple Silicon (macOS)

What’s included in this fork

Requirements

Quick start

Hugging Face token (for gated/private models)

Basic usage

Launch Gradio demo

CLI-style inference

Clean everything

Useful env/flags

Models (quick guide)

What the script does (under the hood)

Troubleshooting

Notes & tips

Responsible use (risks & limitations)

Acknowledgements

License

Repository layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VibeVoice4macOS — VibeVoice on Apple Silicon (macOS)

What’s included in this fork

Requirements

Quick start

Hugging Face token (for gated/private models)

Basic usage

Launch Gradio demo

CLI-style inference

Clean everything

Useful env/flags

Models (quick guide)

What the script does (under the hood)

Troubleshooting

Notes & tips

Responsible use (risks & limitations)

Acknowledgements

License

Repository layout

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages