Tired of stacking conference lectures, TED talks, and deep 1-on-1 interviews into a Watch Later you'll never actually finish? Chorus is for you — drop in the URL, walk away with a 5-minute two-voice short where a host narrates and the speaker themselves quotes verbatim.
Or hand it a name — "Dale Schuurmans", "Mo Gawdat" — and Chorus searches YouTube, picks 3–5 talks, and stitches a chronological audio biography that tracks how the person's views evolved era by era. Either mode also gives you themed ~10-minute long chapters alongside the short.
Claude Opus 4.6 agents write it; a critic agent grep -Fs every quoted line back into the raw transcript and blocks the pipeline if a single word drifts. Kokoro 82M renders it locally in two voices. Every quote in the script links back to the exact second on YouTube. Runs on your Claude Pro/Max subscription — no ANTHROPIC_API_KEY needed.
A 4.9-minute short the pipeline produced from a 64-minute RLC 2025 keynote by Dale Schuurmans, Language Models and Computation. No human edits inside the run.
demo.mp4
🎧 Download the MP3 · 📄 Read the annotated script
The .md is the human-readable script — every quote is a blockquote with a clickable YouTube timestamp. The MP3 is what Kokoro did with the underlying TTS-tagged source.
1. A critic agent audits every quote. Every line tagged [DALE] (or whatever the subject's first name is) must be a byte-for-byte substring of the raw transcript. The transcript-critic agent runs grep -F against transcript.txt and blocks the pipeline if even one word drifts. Same idea cross-source for name mode via corpus-script-critic.
2. The host and the subject sound different. Kokoro renders narration in one voice and quotes in another, paired by subject gender. "What the host says about Dale" and "what Dale actually said" are audibly distinct.
3. Source-linked Markdown is the published artifact. Every script ships as script.md — each verbatim quote becomes a blockquote with a deep-linked YouTube timestamp. Click → land on the exact second. The raw [HOST]/[DALE] script.txt is the TTS source; the .md is what humans read.
> [**12:37**](https://www.youtube.com/watch?v=YnMqbpdHcaY&t=757s) · _ICAPS 2024 Keynote_ · 2024-07-02
>
> there was this sense in which this started to look to me at least like an actual computer where you you were you know providing a problem...4. No fabricated motivation. When a thinker's view shifts but they never said why, the era-aggregator marks it honestly and the corpus-script-critic blocks any chapter that invents a reason.
5. Runs off your Claude subscription. No API key, no billing dashboard. The Agent SDK shells out to your local claude binary — see Subscription auth below for the one-line trick that guarantees this.
uvx chorus "https://www.youtube.com/watch?v=yGLoWZP1MyA"A transcript-investigator extracts (theme, thesis, verbatim_quote, timestamp) tuples. The transcript-critic greps each quote. The opinion-aggregator clusters into 3–5 themes. The script-writer drafts a ~5-minute short plus one ~10-minute chapter per theme. The script-critic verifies. chorus-annotate produces script.md sidecars. Kokoro renders.
uvx chorus "Dale Schuurmans" --max-videos 5This is the more ambitious mode. Chorus:
- Searches YouTube (
chorus-find) across 6 keyword templates —keynote,interview,podcast,talk,lecture,fireside— applies a 20-min duration filter, dedupes, enriches with upload dates →candidates.json. - Filters to formal talks BY the subject (
interview-finder) — drops commentary, clips, wrong-person matches →videos.json. - Investigates every video in parallel —
transcript-investigator + transcript-criticloops fan out viaasyncio.gather. - Builds a chronological era structure (
era-aggregator) — clusters opinions into 2–4 eras, identifies transitions with before/after quotes, surfaces stable themes that span every era. Only agent that sees more than one talk at once. Refuses to invent why a view shifted. - Writes a chronological script (
corpus-script-writer) — 5-minute overview plus one ~10-minute chapter per era. Each era chapter opens with a transition passage from the previous era. - Cross-source critic (
corpus-script-critic) — verbatim-grep against every per-video transcript, and a hardfabricated_motivationcheck on every transition opening. - Annotates —
script.md+sources.jsonfor every script (deterministic Python viachorus-annotate). - Renders — Kokoro emits one MP3 per script.
You get an overview MP3 plus one MP3 per era. Re-run any time — --skip-search reuses the curated videos.json; finished stages resume from disk.
Pick whichever install style matches how you work:
brew install espeak-ng ffmpeg
npm install -g @anthropic-ai/claude-code && claude # one-time login
uvx chorus "https://www.youtube.com/watch?v=<ID>" # URL mode
uvx chorus "Dale Schuurmans" # name mode
uvx chorus-doctor # self-check deps/plugin install zhuconv/chorus
/chorus https://www.youtube.com/watch?v=<ID>
/chorus "Dale Schuurmans" --max-videos 5
Ships all 9 agents plus the /chorus skill. Requires the chorus Python package (uv tool install chorus or pip install chorus).
brew install espeak-ng ffmpeg
npm install -g @anthropic-ai/claude-code && claude
git clone https://github.com/zhuconv/chorus && cd chorus
uv sync
uv run chorus "https://www.youtube.com/watch?v=<ID>"
open output/<Subject>/<date>__<id>__<slug>/short/short.mp3Every box is one claude_agent_sdk.query() call. System prompts live in agents/<role>.md — no code edits required to tweak behavior. Critics can send work back upstream for one retry; after that the orchestrator surfaces the failure.
The 9 agents:
| Agent | Mode | Job |
|---|---|---|
transcript-investigator |
both | Extract (theme, thesis, verbatim_quote, timestamp) tuples. |
transcript-critic |
both | grep -F every quote against the transcript. Blocks on drift. |
opinion-aggregator |
URL | Cluster into 3–5 themes; assign subject_gender. |
script-writer |
URL | Draft short + per-theme chapter scripts. |
script-critic |
URL | Verbatim fidelity, TTS style, word counts. |
interview-finder |
name | Filter raw search to formal talks BY the subject. |
era-aggregator |
name | Build 2–4 chronological eras + transitions. Never fabricates motivation. |
corpus-script-writer |
name | Chronological overview + per-era chapters with transition openings. |
corpus-script-critic |
name | Cross-source verbatim grep + fabricated_motivation block. |
A single command pulled in 3 talks spanning 2019–2025 and produced 4 MP3s totaling ~35 minutes:
| File | Era | Source talk |
|---|---|---|
short/overview.mp3 |
All three | Chronological overview |
long/era01_subbasement.mp3 |
2019 — Optimization in RL | DLRLSS 2019 |
long/era02_llms_as_computers.mp3 |
2024 — LLMs as a new kind of computer | ICAPS 2024 Keynote |
long/era03_hard_limits.mp3 |
2025 — Computational impossibility results | RLC 2025 keynote |
🎧 Listen to the chronological overview · 📄 Read the annotated script (every quote has a clickable YouTube timestamp)
The era-aggregator identified one stable theme running through all three eras — "Think computationally, not statistically" — supported by verbatim quotes from each talk. That theme becomes the through-line of the overview script.
| Source | Length | Kind | Result |
|---|---|---|---|
| Dale Schuurmans — RLC 2025 keynote, Language Models and Computation | 64 min | Academic lecture | 5 MP3s, ~42 min total. Zero human edits inside the pipeline. Listen. |
| Mo Gawdat — interview on AI, UBI, and the job market | 40 min | Pop interview | 5 MP3s, ~29 min total. Validated end-to-end. |
Two genres, same pipeline, coherent output.
output/
└── Dale_Schuurmans/
└── 2025-08-25__yGLoWZP1MyA__dale_schuurmans_language_models/
├── metadata.json
├── transcript.srt / transcript.txt
├── opinions.raw.json # investigator
├── transcript_critic_report.json
├── opinions.json # aggregator — themes, subject_gender
├── script_critic_report.json
├── short/
│ ├── script.txt # raw [HOST]/[DALE] tagged TTS source
│ ├── script.md # human-readable, source-linked
│ ├── sources.json
│ └── short.mp3
└── long/
├── ch01_<slug>_script.txt # ~1500 words / ~10 min per chapter
├── ch01_<slug>.md # source-linked sidecar
├── ch01_<slug>_sources.json
├── ch01_<slug>.mp3
└── ...
output/
└── Dale_Schuurmans/
├── _corpus/
│ ├── candidates.json # chorus-find raw search
│ ├── videos.json # interview-finder verdict
│ ├── opinions_index.json
│ ├── evolution.json # era-aggregator — eras, transitions, stable themes
│ ├── script_critic_report.json
│ └── <date>__<video_id>__<slug>/ # per-video artifacts
├── short/
│ ├── script.txt
│ ├── script.md # chronological overview, source-linked
│ ├── sources.json
│ └── overview.mp3
└── long/
├── era01_<slug>_script.txt # one chapter per era
├── era01_<slug>.md
├── era01_<slug>_sources.json
├── era01_<slug>.mp3
└── ...
Every run produces a script.md next to script.txt plus sources.json. Each verbatim quote is a blockquote with a deep-linked YouTube timestamp:
> [**1:03:19**](https://www.youtube.com/watch?v=yGLoWZP1MyA&t=3799s) · _Dale Schuurmans, Language Models and Computation_ · 2025-08-25
>
> machine learning is awesome. Reinforcement learning even more so, but computer science matters. Especially when you're trying to train LLMs to to serve a whole range of problem instances. You are now confronted with the laws of computation.Click the timestamp → YouTube jumps to that exact second. Each [HOST] paragraph carries the nearest preceding quote's video as a "near" attribution in sources.json, so downstream tools can highlight which talk a paraphrase passage draws from.
Run chorus-annotate on any existing run to regenerate sidecars without touching the agent pipeline.
The Agent SDK shells out to your local claude binary, which inherits Pro/Max OAuth from claude login. To make sure that path is taken even if you have an ANTHROPIC_API_KEY exported in your shell, every query() call uses:
ClaudeAgentOptions(env={"ANTHROPIC_API_KEY": ""})The SDK merges this dict last when building the subprocess env. The clobbered key is empty, so the claude CLI falls back to its OAuth-stored subscription auth. The orchestrator prints a note: at startup if a key was detected — so you know it was bypassed, not used.
[HOST] Narration spoken by the host voice.
[DALE] "A verbatim direct quote in quotation marks."
[HOST] More narration.
[HOST]— narrator voice. All framing and paraphrase.[<FIRSTNAME>]— subject voice. Verbatim quotes only, wrapped in"…". Mo Gawdat →[MO]. Naval Ravikant →[NAVAL].- Blank line = paragraph break (longer audio pause).
This .txt is the TTS source Kokoro reads. Humans read the .md sidecar that chorus-annotate generates next to it.
Two Kokoro voices, paired by subject gender so the narrator and subject are always audibly different:
| Subject gender | Host voice | Quote voice |
|---|---|---|
| Male | af_heart |
am_puck |
| Female | am_puck |
af_heart |
Override with --host-voice / --quote-voice on chorus-tts.
Every stage is a standalone command:
uv run chorus-fetch <URL> -o output/ # transcript only
uv run chorus-find "<name>" -o output/<Subject>/_corpus/ # YouTube search only
uv run chorus-tts path/to/script.txt -o out.mp3 \ # TTS from any tagged script
--subject-gender male
uv run chorus-annotate output/<Subject>/ # regenerate .md sidecars
uv run chorus-doctor # dependency self-checkuv run chorus <URL_or_name> [--model claude-opus-4-6]
# URL mode:
[--skip-fetch] # reuse existing transcript
# name mode:
[--max-videos 5] # cap on videos selected
[--min-duration 1200] # min seconds per candidate
[--skip-search] # reuse candidates.json + videos.json
# both:
[--skip-synth] # stop before KokoroExpand for prerequisites, Python, auth
System (macOS tested):
brew install espeak-ng ffmpegClaude Code CLI — the Agent SDK spawns this and inherits its login. No ANTHROPIC_API_KEY is used anywhere in the orchestrator. Even if you have one exported, the orchestrator clobbers it inside the subprocess env via ClaudeAgentOptions(env={"ANTHROPIC_API_KEY": ""}), so every agent turn bills against your Pro/Max subscription. A note: is printed at run start if a key is detected.
npm install -g @anthropic-ai/claude-code
claude # one-time login; Pro/Max covers all agent callsPython 3.11–3.12:
uv sync # installs yt-dlp, kokoro, claude-agent-sdk, torch, etc.First Kokoro run downloads ~330 MB of model weights from Hugging Face. After that it runs fully offline on CPU or Apple Silicon MPS.
Agent behavior lives in markdown under agents/ at the repo root (with a .claude/agents symlink so in-repo Claude Code sessions also see them). The YAML frontmatter names the role and declares allowed tools; the body is the system prompt. Next run picks up your changes — no code edit required.
Useful tweaks:
- Loosen
script-critic/corpus-script-criticacronym-expansion for technical audiences (they currently nag "AI" and "LLM" on every occurrence). - Change chapter-count target in
opinion-aggregator(defaults to 3–5 based on content density). - Tighten
transcript-investigatordensity targets for longer sources. - Change the search keyword menu in
src/chorus/youtube_search/find.py::DEFAULT_QUERIESor raiseDEFAULT_PER_QUERYif the finder needs a bigger pool. - Tune the era boundaries in
era-aggregator.md— 2–4 eras is the default envelope; widen if you're processing someone with a 30-year corpus.
- Name-mode runtime is long. A 5-video corpus takes ~20 min of yt-dlp work plus ~20–30 min of parallel agent wall-clock per video. Expect 1–2 hours end-to-end.
- URL-mode runtime is also long. A 60-min source takes ~45–60 min of agent wall-clock plus ~6 min of Kokoro synth on Apple Silicon. The investigator alone runs 8–15 min because it grep-verifies every quote.
- Critic warnings are noisy. Many
warn-level issues (acronyms, homographs) don't actually degrade audio. Treat the verdict as advisory unless it'sblock. - YouTube search quality is coarse. For less-indexed academics or people with common names, the
interview-finderagent excludes aggressively — you'll get 3 videos when you asked for 5. Pass a more specific name ("Dale Schuurmans University of Alberta") to tighten matches. - No motivation fabrication. The
era-aggregatorandcorpus-script-criticjointly refuse to invent why a view changed when the subject never said. Honest, but the script will sometimes say "he has not said publicly why this shifted." - English only. Kokoro supports other languages but the voice conventions and script rules are tuned for English.
- macOS-tested. Dependencies (
espeak-ng,ffmpeg,torch) exist on Linux and WSL, but I haven't verified the full pipeline there.
--subject-name-hint/--institutionflags to disambiguate same-name people at the finder stage.- Relax
script-critic/corpus-script-criticacronym strictness automatically for technical audiences. - Optional
--stylepreset for the script writers (editorial / narrative / lecture-notes). - Parallel chapter synthesis in
chorus-tts(currently sequential per script). - Cached
candidates.jsonwarmup — fall back to a stale cache with a warning when--skip-searchis set but no cache exists. - Publish to PyPI so
uvx chorusworks without cloning.
chorus/
├── pyproject.toml # deps + 6 CLI entry points
├── .claude-plugin/plugin.json # Claude Code plugin manifest
├── agents/ # 9 markdown agent specs (editable)
│ ├── transcript-investigator.md
│ ├── transcript-critic.md
│ ├── opinion-aggregator.md
│ ├── script-writer.md
│ ├── script-critic.md
│ ├── interview-finder.md # name mode: filter search to formal talks
│ ├── era-aggregator.md # name mode: chronological clustering + transitions
│ ├── corpus-script-writer.md # name mode: per-era chapters with transition openings
│ └── corpus-script-critic.md # name mode: cross-source verbatim + fabrication checks
├── skills/chorus/SKILL.md # /chorus slash-command spec
├── .claude/ # symlinks to agents/ and skills/ for in-repo dev
├── src/chorus/
│ ├── orchestrator.py # `chorus` CLI — URL + name mode driver
│ ├── youtube_fetcher/fetch.py # `chorus-fetch` — yt-dlp transcript fetcher
│ ├── youtube_search/find.py # `chorus-find` — yt-dlp name-based search
│ ├── kokoro_tts/synthesize.py # `chorus-tts` — two-voice Kokoro wrapper
│ ├── annotate/markdown.py # `chorus-annotate` — .md sidecar generator
│ └── doctor.py # `chorus-doctor` — dependency self-check
├── examples/ # Dale Schuurmans short (mp3 + md + txt)
└── output/ # per-video + per-subject artifacts (gitignored)
MIT for the Chorus code. Kokoro model weights are Apache-2.0. You are responsible for complying with YouTube's Terms of Service for any transcripts you fetch.