A team of agents that shoots your headshot explainer video — script to final cut, end to end. Drop in your script, your headshot footage (or your voice + photo), your supporting materials, and a team of specialised agents produces a finished A-roll (talking head) + B-roll (charts, animations, diagrams) cut, optimised for YouTube, Bilibili, 视频号 or TikTok.
一队会拍 headshot 讲解片的 agent,从脚本到成片,自动跑通整条管线。投入脚本、你的头像视频(或仅声音 + 照片)、辅助材料,一队专业 agent 就会生成成片的 A-roll(讲解头像)+ B-roll(图表、动画、示意图)剪辑,并按 YouTube / Bilibili / 视频号 / TikTok 各自规格优化导出。
Model-agnostic, plug-and-play, plugin-style. Runs on Claude Code, Antigravity, or any agent runtime that understands .claude/agents + .claude/skills + .mcp.json.
模型无关、即插即用、插件式。可在 Claude Code、Antigravity 或任何理解 .claude/agents + .claude/skills + .mcp.json 的 agent runtime 中运行。
-
~30 agents organised by phase: orchestrator, script, voice, avatar, rough-cut, B-roll, assembly, quality, output, onboarding. 约 30 个 agent,按阶段分组:orchestrator / 脚本 / 配音 / 头像 / 粗剪 / B-roll / 拼接 / 质检 / 输出 / 引导。
-
~60 skills: from single-step (
/generate-script,/clone-voice,/render-broll-animation) to end-to-end workflows (/full-pipeline,/yolo-go). 约 60 个 skill:从单步(/generate-script、/clone-voice、/render-broll-animation)到端到端工作流(/full-pipeline、/yolo-go)。 -
9 MCP servers with stub fallback: ElevenLabs, HeyGen, Hyperframes, video-use, AssemblyAI, ffmpeg, YouTube Data, Bilibili, Google Drive (asset library). 9 个 MCP server 带 stub 回退:ElevenLabs、HeyGen、Hyperframes、video-use、AssemblyAI、ffmpeg、YouTube Data、Bilibili、Google Drive(素材库)。
-
A
/onboardcommand that asks the right questions and provisions every key, every dependency, every MCP — verified end-to-end. 一个/onboard命令会按顺序问对问题、装好每一个 key、每一个依赖、每一个 MCP,并端到端验证。 -
A YOLO mode for when you trust the agents to floor the throttle from script all the way to final render with no human in the loop — bounded by per-API and total USD caps you set up front. 一个 YOLO 模式,当你信任 agent 从脚本到成片一脚油门踩到底、无人干预 — 由你预先设定的单 API 美元上限和总上限控制。
Phase 1 Headshot Generation ElevenLabs (voice clone) + HeyGen (avatar)
头像生成 每段切 45-60s 独立渲染,最后串联
Phase 2 Rough Cut AssemblyAI (disfluency detection) + ffmpeg
粗剪 可选 video-use / Descript
去 filler word + 长停顿压缩
Phase 3 B-roll Design & Render Hyperframes (HTML→MP4, HeyGen open-source)
B-roll 设计与渲染 可选 Manim / Motion Canvas / Remotion
Phase 4 Final Assembly ffmpeg + moviepy + Auto-Editor
最终拼接 时间线 + 音频混音 + 双语字幕 + 章节标记
Phase 5 Platform Export 按 YouTube / Bilibili / 视频号 / TikTok 规格
平台导出 分辨率 / 比例 / 码率 / 字幕烧录差异化
If you provide your own headshot footage, Phase 1 is skipped automatically. 如果你提供自己的头像视频,Phase 1 自动跳过。
┌──────────────────────────────────────────────────────────┐
│ /onboard → collects keys, budget, style, target plat. │
└──────────────────────────────────────────────────────────┘
│
▼
┌──────────────────┐
│ orchestrator │
└────────┬─────────┘
│
┌──────────────────────────────┼──────────────────────────────┐
▼ ▼ ▼
┌───────────┐ ┌───────────────┐ ┌──────────────┐
│ script-* │ │ voice-* / │ │ broll-* / │
│ agents │── 45-60s seg ─▶│ avatar-* │ │ hyperframes │
└───────────┘ └──────┬────────┘ └──────┬───────┘
│ A-roll segs │ B-roll
▼ ▼
┌──────────────────────────────────────────┐
│ assembly: timeline + audio + subtitles │
└──────────────────┬───────────────────────┘
▼
┌───────────────────────────┐
│ quality + output-platform │
└───────────────────────────┘
Before you run /onboard, make sure your machine has the following.
运行 /onboard 之前,请确保本机已安装:
- Python 3.10+ —
python3 --version. macOS:brew install python@3.11. Ubuntu:sudo apt install python3.11 python3.11-venv. Windows: download from python.org. - ffmpeg (and ffprobe) — required for every video/audio operation.
- macOS:
brew install ffmpeg - Ubuntu/Debian:
sudo apt install ffmpeg - Windows:
choco install ffmpeg
- macOS:
- Node.js 18+ — needed for Hyperframes CLI and some MCP servers.
node --version. - Git + GitHub CLI (
gh) —gh auth statusshould report logged in. - An agent runtime: Claude Code (recommended) OR Google Antigravity. Either works — the
.claude/directory is picked up by both. - API keys — get them ready before
/onboard(the command will prompt):- ElevenLabs (voice clone + TTS): https://elevenlabs.io
- HeyGen (avatar video): https://app.heygen.com/settings/api
- AssemblyAI (transcription + filler detection): https://www.assemblyai.com
- OpenAI or Anthropic (script writing): https://platform.openai.com / https://console.anthropic.com
- (optional) YouTube Data API v3 OAuth client: https://console.cloud.google.com
- (optional) Bilibili cookies (SESSDATA + bili_jct): get from browser after login
Hyperframes and video-use are open-source; no API key needed. Hyperframes 和 video-use 都是开源的,不需要 API key。
git clone https://github.com/huodebing-alt/Agentic-Headshot-Video-Studio.git
cd Agentic-Headshot-Video-Studio
# Top-level Python deps
pip install -r requirements.txt
# Per-MCP-server deps (isolated, optional but recommended)
bash scripts/install-mcp-deps.sh
# Optional: install Hyperframes CLI globally
npm install -g @heygen/hyperframes || trueOpen Claude Code or Antigravity at the repo root and run: 在仓库根目录打开 Claude Code 或 Antigravity,运行:
/onboard
The onboarding agent will: 引导 agent 会:
- Ask video type (explainer / tutorial / lecture / demo). 询问视频类型(讲解 / 教程 / 课程 / 演示)。
- Ask target platform (YouTube / Bilibili / 视频号 / TikTok / generic). 询问目标平台(YouTube / Bilibili / 视频号 / TikTok / 通用)。
- Ask language (中文 / English / bilingual). 询问语言(中文 / English / 双语)。
- Ask whether you'll upload your own headshot or use AI avatar. 询问是否上传自己头像还是使用 AI Avatar。
- Collect and verify every API key. 收集并验证每一个 API key。
- Ask whether to enable YOLO mode, and if yes, confirm a per-API + total USD budget cap. 询问是否启用 YOLO 模式,若是,确认单 API + 总美元上限。
- Install MCP server dependencies and run
scripts/test-mcp.shend-to-end. 安装 MCP server 依赖并端到端运行scripts/test-mcp.sh。
Pick a skill: 选一个 skill:
/full-pipeline # A-roll + B-roll, full assembly
/headshot-only # Just generate A-roll segments from script
/broll-only # Animate B-roll given an existing A-roll
/rough-cut-only # Clean filler words / pauses from user headshot
/yolo-go # Hands-off, runs every phase to final render
YOLO mode is opt-in, budget-capped, and logged. YOLO 模式默认关闭、有美元上限、全程记录日志。
The orchestrator will not proceed if estimated cost exceeds the cap you set at onboarding. Every API call is recorded in logs/yolo-spend.jsonl.
若预估费用超过你在引导时设定的上限,orchestrator 不会继续。每次 API 调用都记录在 logs/yolo-spend.jsonl。
See docs/YOLO_MODE.md for the full authorization checklist.
完整授权清单见 docs/YOLO_MODE.md。
Agentic-Headshot-Video-Studio/
├── .claude/
│ ├── agents/ # ~30 agent definitions
│ ├── skills/ # ~60 skill SKILL.md files
│ └── commands/ # slash commands (/onboard, /full-pipeline, ...)
├── mcp-servers/ # 9 Python MCP servers (stub-able)
├── scripts/ # install-mcp-deps.sh, test-mcp.sh, ...
├── templates/ # script / storyboard / SRT / ffmpeg filter templates
├── examples/ # end-to-end worked examples
├── docs/ # ARCHITECTURE / AGENT_CATALOG / SKILL_CATALOG / ...
├── config/ # api-keys.example.yaml
├── .mcp.json # MCP server registry
├── requirements.txt
├── package.json
└── README.md (this file)
docs/ARCHITECTURE.md— agent topology, message contracts, state storesdocs/AGENT_CATALOG.md— every agent, one line eachdocs/SKILL_CATALOG.md— every skill, one line eachdocs/MCP_CATALOG.md— every MCP server, env vars, tool namesdocs/ONBOARDING.md— the/onboardflow, step by stepdocs/YOLO_MODE.md— the YOLO authorization checklist & spend capsdocs/DOCUMENT_STYLE_GUIDE.md— typography, captions, lower-thirdsdocs/COMPLIANCE_DISCLAIMER.md— what this template is and is not
- HKUDS/ViMax — Director / Screenwriter / Producer / Generator role split
- calesthio/OpenMontage — pipeline / skill / tool three-layer model
- heygen-com/skills — official HeyGen Claude Code skills, MCP OAuth + API key dual mode
If this template saved you time, please consider sponsoring continued work: 如果这个模板为你节省了时间,欢迎赞助持续开发:
Every video you ship faster is one more idea reaching people who need it. Thank you for keeping open templates like this alive.
每多发布一条视频,就有更多需要的人能接触到一个好想法。感谢你让开源模板继续活着。
MIT. See LICENSE.