YouTube Channel Transcript Fetcher

Export LLM-ready plain-text transcripts for an entire YouTube channel — with filtering, a local web UI, and CLI tooling.

Built on top of youtube-transcript-api (MIT). This repo adds channel scraping (yt-dlp), optional YouTube Data API enrichment, transcript fetching with proxy support, and export formatting.

Features

Web UI — start exports from the browser, watch progress, preview, and download
CLI — scriptable batch exports for automation
Smart filtering — age gate, view-count percentile cutoff, optional view floor, max video cap
Scrape fallbacks — yt-dlp listing with optional YOUTUBE_API_KEY batch metadata enrichment
Proxy support — Webshare residential or generic HTTP proxies for YouTube IP blocks
Compact export format — token-efficient output with grouped failure summaries

Requirements

Python 3.8+ (3.11+ recommended)
ffmpeg — required by yt-dlp for some metadata paths (install guide)
Optional: YouTube Data API v3 key — helps when yt-dlp cannot enrich video metadata
Optional: Residential proxy — recommended when transcript requests hit YouTube IP blocks

Installation

Option A — Poetry (recommended)

git clone https://github.com/kadinsolaiman8-spec/YouTube-Channel-Transcript-Fetcher.git
cd YouTube-Channel-Transcript-Fetcher

poetry install --with channel,test

Option B — pip (editable)

git clone https://github.com/kadinsolaiman8-spec/YouTube-Channel-Transcript-Fetcher.git
cd YouTube-Channel-Transcript-Fetcher

pip install -e ".[channel]"

For development and tests:

pip install -e ".[channel,test,dev]"

Environment setup

Copy the example env file and fill in values:

cp .env.example .env.local

Variable	Required	Purpose
`YOUTUBE_API_KEY`	Optional	YouTube Data API v3 — used when yt-dlp cannot list or enrich channel uploads. Create at Google Cloud Console and enable YouTube Data API v3.
`WEBSHARE_PROXY_USERNAME`	Optional	Webshare residential proxy username (server-side only)
`WEBSHARE_PROXY_PASSWORD`	Optional	Webshare residential proxy password
`WEBSHARE_PROXY_RETRIES`	Optional	Retries per transcript on HTTP 429 (default 2; core library default is 10)
`HTTP_PROXY` / `HTTPS_PROXY`	Optional	Generic proxy URLs for transcript fetches

Load order: shell environment wins → .env → .env.local (overrides .env).

Both .env and .env.local are gitignored. Never commit API keys or proxy credentials.

Restart the web server after changing env files — the worker loads env at process start.

Web UI

Start the local server (binds to 127.0.0.1:8080 only):

youtube_channel_web

Or directly with uvicorn:

uvicorn youtube_transcript_api.channel.web.app:app --host 127.0.0.1 --port 8080

Open http://127.0.0.1:8080.

Web workflow

Paste a channel URL (@handle, /channel/UC..., or uploads playlist list=UU...)
Adjust filters (percentile cutoff, min age, sort order, languages)
Open Advanced for scrape cookies, fetch workers, delay, and max videos
Start export — progress, ETA, and status log update live
Preview or download the .txt export when complete

Advanced settings

Setting	Default	Notes
Fetch workers	2	Use 1 when proxies are configured or after 429 errors
Delay between fetches	0.5s	Use 1.0–2.0+ with proxies or rate limits
Scrape workers	2	Parallel metadata enrichment during scrape
Max videos	0 (unlimited)	Set to 3 for a smoke test before full export
Cookies from browser	—	e.g. `brave:Default` — fully quit the browser first
Cookie file	—	Netscape-format cookies.txt path on the server machine

Proxy credentials are read from .env.local on the server — they are not sent from the browser.

CLI

youtube_channel_export "https://www.youtube.com/@ChannelName" -o export.txt

Common flags

Flag	Default	Description
`--percentile`	10	Drop bottom N% of videos by view count
`--min-age-days`	14	Exclude videos newer than N days
`--min-views-floor`	0	Hard minimum view count (0 = off)
`--sort`	asc	`asc` or `desc` by publish date
`--languages`	en	Comma-separated transcript language codes
`--output`, `-o`	channel_export.txt	Output file path
`--max-workers`	2	Concurrent transcript fetch workers
`--sleep`	0.5	Seconds between transcript requests
`--scrape-workers`	2	Metadata enrichment workers during scrape
`--max-videos`	0	Cap videos after filtering (0 = no limit)
`--export-density`	compact	`compact` or `verbose`
`--cookies-browser`	—	yt-dlp browser cookies (e.g. `chrome:Default`)
`--cookiefile`	—	Path to Netscape cookie file
`--webshare-proxy-username` / `--webshare-proxy-password`	—	Override Webshare env vars
`--http-proxy` / `--https-proxy`	—	Override generic proxy env vars

Example commands

# Smoke test — 3 videos, slower pacing
youtube_channel_export "https://www.youtube.com/@ChannelName" \
  --max-videos 3 --max-workers 1 --sleep 2.0 -o smoke.txt

# With browser cookies for bot-blocked metadata
youtube_channel_export "https://www.youtube.com/@ChannelName" \
  --cookies-browser "brave:Default" -o export.txt

# With explicit Webshare proxy
youtube_channel_export "https://www.youtube.com/@ChannelName" \
  --webshare-proxy-username USER --webshare-proxy-password PASS \
  --max-workers 1 --sleep 1.5 -o export.txt

Export format

Compact exports start with a one-line header:

# Channel Name | export:2026-05-20T12:00:00+00:00 | transcripts:2/105 | scraped:119 filtered_out:14 | sort:oldest | filter:...

Field	Meaning
`transcripts:X/Y`	X transcripts retrieved, Y videos attempted after filtering
`scraped`	Videos found before filtering
`filtered_out`	Videos removed by filters

Failed videos appear under # FAILED with short one-line reasons. Exports with 10+ failures include a grouped summary.

Troubleshooting

Transcript IP blocks (`IpBlocked`, `transcripts:0/N`)

YouTube is blocking caption downloads from your IP.

Add WEBSHARE_PROXY_* or HTTP_PROXY to .env.local and restart the server
Set fetch workers to 1 and delay to 1.0+ (Advanced sheet or --max-workers 1 --sleep 2.0)
Run a small --max-videos 3 test before a full export

See the upstream Working around IP bans guide.

HTTP 429 / rate limits

Reduce concurrency and increase delay. Check the error report for proxy_configured: true to confirm the worker loaded proxy env vars.

Scrape works but all metadata fails (`enrich_failures` = video count)

Confirm YOUTUBE_API_KEY is set and YouTube Data API v3 is enabled with quota
Try browser cookies in Advanced (cookies_from_browser or cookie file)
Update yt-dlp: pip install -U yt-dlp

Bot check during scrape

Use browser cookies — fully quit the browser before auto-read, or export a Netscape cookie file.

High memory usage

Exports run in a child process; the web server stays lightweight. See docs/channel-memory.md for profiling.

Testing

# All tests
pytest youtube_transcript_api

# Channel feature tests only
pytest youtube_transcript_api/test/test_channel_export.py \
  youtube_transcript_api/test/test_channel_web.py \
  youtube_transcript_api/test/test_channel_pipeline.py

Project layout

youtube_transcript_api/
  channel/
    cli.py              # youtube_channel_export entry point
    web/app.py          # FastAPI web UI (youtube_channel_web)
    pipeline.py         # Scrape → filter → fetch → export
    scraper.py          # yt-dlp + YouTube Data API
    fetcher.py          # Transcript download with proxy support
    export.py           # Plain-text formatting
    proxy.py            # Env/CLI proxy configuration
    worker.py           # Isolated subprocess for web exports
docs/
  channel-memory.md     # Memory profiling and advanced troubleshooting

Security notes

The web UI listens on 127.0.0.1 only — intended for local use
API keys and proxy passwords stay in .env.local on the server, never in the browser
Cookie files grant session access — treat them like passwords
Do not expose the web server to the public internet without adding authentication

License

MIT — see LICENSE.

This project extends youtube-transcript-api by Jonas Depoix (MIT).

Name		Name	Last commit message	Last commit date
Latest commit History 406 Commits
.github		.github
docs		docs
scripts		scripts
youtube_transcript_api		youtube_transcript_api
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YouTube Channel Transcript Fetcher

Features

Requirements

Installation

Option A — Poetry (recommended)

Option B — pip (editable)

Environment setup

Web UI

Web workflow

Advanced settings

CLI

Common flags

Example commands

Export format

Troubleshooting

Transcript IP blocks (`IpBlocked`, `transcripts:0/N`)

HTTP 429 / rate limits

Scrape works but all metadata fails (`enrich_failures` = video count)

Bot check during scrape

High memory usage

Testing

Project layout

Security notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

YouTube Channel Transcript Fetcher

Features

Requirements

Installation

Option A — Poetry (recommended)

Option B — pip (editable)

Environment setup

Web UI

Web workflow

Advanced settings

CLI

Common flags

Example commands

Export format

Troubleshooting

Transcript IP blocks (IpBlocked, transcripts:0/N)

HTTP 429 / rate limits

Scrape works but all metadata fails (enrich_failures = video count)

Bot check during scrape

High memory usage

Testing

Project layout

Security notes

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Transcript IP blocks (`IpBlocked`, `transcripts:0/N`)

Scrape works but all metadata fails (`enrich_failures` = video count)

Packages