Skip to content

huodebing-alt/Agentic-Headshot-Video-Studio

Repository files navigation

Agentic-Headshot-Video-Studio

A team of agents that shoots your headshot explainer video — script to final cut, end to end. Drop in your script, your headshot footage (or your voice + photo), your supporting materials, and a team of specialised agents produces a finished A-roll (talking head) + B-roll (charts, animations, diagrams) cut, optimised for YouTube, Bilibili, 视频号 or TikTok.

一队会拍 headshot 讲解片的 agent,从脚本到成片,自动跑通整条管线。投入脚本、你的头像视频(或仅声音 + 照片)、辅助材料,一队专业 agent 就会生成成片的 A-roll(讲解头像)+ B-roll(图表、动画、示意图)剪辑,并按 YouTube / Bilibili / 视频号 / TikTok 各自规格优化导出。

Model-agnostic, plug-and-play, plugin-style. Runs on Claude Code, Antigravity, or any agent runtime that understands .claude/agents + .claude/skills + .mcp.json.

模型无关、即插即用、插件式。可在 Claude Code、Antigravity 或任何理解 .claude/agents + .claude/skills + .mcp.json 的 agent runtime 中运行。


What this template gives you / 你将获得什么

  • ~30 agents organised by phase: orchestrator, script, voice, avatar, rough-cut, B-roll, assembly, quality, output, onboarding. 约 30 个 agent,按阶段分组:orchestrator / 脚本 / 配音 / 头像 / 粗剪 / B-roll / 拼接 / 质检 / 输出 / 引导。

  • ~60 skills: from single-step (/generate-script, /clone-voice, /render-broll-animation) to end-to-end workflows (/full-pipeline, /yolo-go). 约 60 个 skill:从单步(/generate-script/clone-voice/render-broll-animation)到端到端工作流(/full-pipeline/yolo-go)。

  • 9 MCP servers with stub fallback: ElevenLabs, HeyGen, Hyperframes, video-use, AssemblyAI, ffmpeg, YouTube Data, Bilibili, Google Drive (asset library). 9 个 MCP server 带 stub 回退:ElevenLabs、HeyGen、Hyperframes、video-use、AssemblyAI、ffmpeg、YouTube Data、Bilibili、Google Drive(素材库)。

  • A /onboard command that asks the right questions and provisions every key, every dependency, every MCP — verified end-to-end. 一个 /onboard 命令会按顺序问对问题、装好每一个 key、每一个依赖、每一个 MCP,并端到端验证。

  • A YOLO mode for when you trust the agents to floor the throttle from script all the way to final render with no human in the loop — bounded by per-API and total USD caps you set up front. 一个 YOLO 模式,当你信任 agent 从脚本到成片一脚油门踩到底、无人干预 — 由你预先设定的单 API 美元上限和总上限控制。


Pipeline overview / 管线总览

Phase 1  Headshot Generation     ElevenLabs (voice clone) + HeyGen (avatar)
         头像生成                  每段切 45-60s 独立渲染,最后串联

Phase 2  Rough Cut               AssemblyAI (disfluency detection) + ffmpeg
         粗剪                     可选 video-use / Descript
                                  去 filler word + 长停顿压缩

Phase 3  B-roll Design & Render  Hyperframes (HTML→MP4, HeyGen open-source)
         B-roll 设计与渲染        可选 Manim / Motion Canvas / Remotion

Phase 4  Final Assembly          ffmpeg + moviepy + Auto-Editor
         最终拼接                 时间线 + 音频混音 + 双语字幕 + 章节标记

Phase 5  Platform Export         按 YouTube / Bilibili / 视频号 / TikTok 规格
         平台导出                 分辨率 / 比例 / 码率 / 字幕烧录差异化

If you provide your own headshot footage, Phase 1 is skipped automatically. 如果你提供自己的头像视频,Phase 1 自动跳过。


Pipeline diagram / 管线示意图

            ┌──────────────────────────────────────────────────────────┐
            │  /onboard  →  collects keys, budget, style, target plat. │
            └──────────────────────────────────────────────────────────┘
                                       │
                                       ▼
                              ┌──────────────────┐
                              │  orchestrator    │
                              └────────┬─────────┘
                                       │
        ┌──────────────────────────────┼──────────────────────────────┐
        ▼                              ▼                              ▼
  ┌───────────┐                ┌───────────────┐              ┌──────────────┐
  │ script-*  │                │  voice-* /    │              │ broll-* /    │
  │ agents    │── 45-60s seg ─▶│  avatar-*     │              │ hyperframes  │
  └───────────┘                └──────┬────────┘              └──────┬───────┘
                                      │ A-roll segs                  │ B-roll
                                      ▼                              ▼
                              ┌──────────────────────────────────────────┐
                              │ assembly: timeline + audio + subtitles   │
                              └──────────────────┬───────────────────────┘
                                                 ▼
                                  ┌───────────────────────────┐
                                  │ quality + output-platform │
                                  └───────────────────────────┘

Prerequisites / 必备环境

Before you run /onboard, make sure your machine has the following. 运行 /onboard 之前,请确保本机已安装:

  1. Python 3.10+python3 --version. macOS: brew install python@3.11. Ubuntu: sudo apt install python3.11 python3.11-venv. Windows: download from python.org.
  2. ffmpeg (and ffprobe) — required for every video/audio operation.
    • macOS: brew install ffmpeg
    • Ubuntu/Debian: sudo apt install ffmpeg
    • Windows: choco install ffmpeg
  3. Node.js 18+ — needed for Hyperframes CLI and some MCP servers. node --version.
  4. Git + GitHub CLI (gh)gh auth status should report logged in.
  5. An agent runtime: Claude Code (recommended) OR Google Antigravity. Either works — the .claude/ directory is picked up by both.
  6. API keys — get them ready before /onboard (the command will prompt):

Hyperframes and video-use are open-source; no API key needed. Hyperframes 和 video-use 都是开源的,不需要 API key。


How to start / 如何开始

1. Install / 安装

git clone https://github.com/huodebing-alt/Agentic-Headshot-Video-Studio.git
cd Agentic-Headshot-Video-Studio

# Top-level Python deps
pip install -r requirements.txt

# Per-MCP-server deps (isolated, optional but recommended)
bash scripts/install-mcp-deps.sh

# Optional: install Hyperframes CLI globally
npm install -g @heygen/hyperframes  || true

2. Onboard / 引导

Open Claude Code or Antigravity at the repo root and run: 在仓库根目录打开 Claude Code 或 Antigravity,运行:

/onboard

The onboarding agent will: 引导 agent 会:

  • Ask video type (explainer / tutorial / lecture / demo). 询问视频类型(讲解 / 教程 / 课程 / 演示)。
  • Ask target platform (YouTube / Bilibili / 视频号 / TikTok / generic). 询问目标平台(YouTube / Bilibili / 视频号 / TikTok / 通用)。
  • Ask language (中文 / English / bilingual). 询问语言(中文 / English / 双语)。
  • Ask whether you'll upload your own headshot or use AI avatar. 询问是否上传自己头像还是使用 AI Avatar。
  • Collect and verify every API key. 收集并验证每一个 API key。
  • Ask whether to enable YOLO mode, and if yes, confirm a per-API + total USD budget cap. 询问是否启用 YOLO 模式,若是,确认单 API + 总美元上限。
  • Install MCP server dependencies and run scripts/test-mcp.sh end-to-end. 安装 MCP server 依赖并端到端运行 scripts/test-mcp.sh

3. Run a pipeline / 运行管线

Pick a skill: 选一个 skill:

/full-pipeline       # A-roll + B-roll, full assembly
/headshot-only       # Just generate A-roll segments from script
/broll-only          # Animate B-roll given an existing A-roll
/rough-cut-only      # Clean filler words / pauses from user headshot
/yolo-go             # Hands-off, runs every phase to final render

YOLO mode / YOLO 模式

YOLO mode is opt-in, budget-capped, and logged. YOLO 模式默认关闭有美元上限全程记录日志

The orchestrator will not proceed if estimated cost exceeds the cap you set at onboarding. Every API call is recorded in logs/yolo-spend.jsonl. 若预估费用超过你在引导时设定的上限,orchestrator 不会继续。每次 API 调用都记录在 logs/yolo-spend.jsonl

See docs/YOLO_MODE.md for the full authorization checklist. 完整授权清单见 docs/YOLO_MODE.md


Project structure / 项目结构

Agentic-Headshot-Video-Studio/
├── .claude/
│   ├── agents/           # ~30 agent definitions
│   ├── skills/           # ~60 skill SKILL.md files
│   └── commands/         # slash commands (/onboard, /full-pipeline, ...)
├── mcp-servers/          # 9 Python MCP servers (stub-able)
├── scripts/              # install-mcp-deps.sh, test-mcp.sh, ...
├── templates/            # script / storyboard / SRT / ffmpeg filter templates
├── examples/             # end-to-end worked examples
├── docs/                 # ARCHITECTURE / AGENT_CATALOG / SKILL_CATALOG / ...
├── config/               # api-keys.example.yaml
├── .mcp.json             # MCP server registry
├── requirements.txt
├── package.json
└── README.md (this file)

Documentation / 文档

  • docs/ARCHITECTURE.md — agent topology, message contracts, state stores
  • docs/AGENT_CATALOG.md — every agent, one line each
  • docs/SKILL_CATALOG.md — every skill, one line each
  • docs/MCP_CATALOG.md — every MCP server, env vars, tool names
  • docs/ONBOARDING.md — the /onboard flow, step by step
  • docs/YOLO_MODE.md — the YOLO authorization checklist & spend caps
  • docs/DOCUMENT_STYLE_GUIDE.md — typography, captions, lower-thirds
  • docs/COMPLIANCE_DISCLAIMER.md — what this template is and is not

Inspirations / 借鉴对象


Support the work / 支持本项目

If this template saved you time, please consider sponsoring continued work: 如果这个模板为你节省了时间,欢迎赞助持续开发:

Sponsor on GitHub

Every video you ship faster is one more idea reaching people who need it. Thank you for keeping open templates like this alive.

每多发布一条视频,就有更多需要的人能接触到一个好想法。感谢你让开源模板继续活着。


License

MIT. See LICENSE.

About

Multi-agent template for automated headshot/talking-head explainer video production. Script -> voice clone -> AI avatar -> rough cut -> B-roll -> final render. Model-agnostic, plug-and-play with /onboard.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors