Video Transcript Summarizer

Transcribe and summarize videos from YouTube, Instagram, TikTok, Twitter, Reddit, Facebook, Google Drive, Dropbox, and local files.

Works with any OpenAI-compatible LLM provider, including locally hosted endpoints.

Interfaces

Interface	Command
CLI	`python -m summarizer --source <source>`
Streamlit GUI	`python -m streamlit run app.py`
Docker	`docker compose up -d` -> `http://localhost:8501`
Agent skill	`.agent/skills/summarize/SKILL.md` for agent access to the CLI

How It Works

               +--------------------+
               |  Video URL/Path    |
               +---------+----------+
                         |
                         v
               +---------+----------+
               |    Source Type?    |
               +---------+----------+
                         |
       +-----------------+-------------+
       |                 |             |
       |             X.com/IG     Local File
    YouTube           TikTok     Google Drive
       |                etc.       Dropbox
       |                 |             |
       v            +----+-----+       |
+------+----------+ | Cobalt   |       |
| Captions Exist? | +----+-----+       |
+----+----+-------+      |             |
    Yes   No             |             |
     |    +--------------+--------+----+
     |                            |
     |                            v
     |                   +--------+--------+
     |                   |     Whisper     |
     |                   |    endpoint?    |
     |                   +--------+--------+
     |                            |
     |                +-----------+-----------+
     |                |                       |
     |           Cloud Whisper          Local Whisper
     |                |                       |
     |                +----------+------------+
     |                           |
     +---------------------------+
                                 |
                            Transcript
                                 |
                                 v
                    +------------+----------+
 summarizer.yaml -> |    Prompt + LLM       |
 prompts.json    -> |    Merge              |
 .env            -> +------------+----------+
                                 |
                                 v
                          +------+-------+
                          |    Output    |
                          +--------------+

summarizer.yaml: Provider settings (base_url, model, chunk-size) and defaults
.env: API keys matched by URL keyword
prompts.json: Summary style templates

Notes:

Cloud Whisper uses Groq Cloud API and requires a Groq API key
The Docker image does not include Local Whisper and is aimed at lightweight VPS deployment

Installation and Usage

Step 0 - CLI installation:

git clone https://github.com/martinopiaggi/summarize.git
cd summarize
pip install -e .

Step 1 - Run the CLI:

python -m summarizer --source "https://youtube.com/watch?v=VIDEO_ID"

The summary is saved to summaries/watch_YYYYMMDD_HHMMSS.md.

Streamlit GUI

python -m streamlit run app.py

Visit port 8501.

Docker

git clone https://github.com/martinopiaggi/summarize.git
cd summarize
# Create [.env](./.env) with your API keys, then:
docker compose up -d

Open http://localhost:8501 for the GUI. Summaries are saved to ./summaries/. The CLI and GUI both read the same summarizer.yaml.

CLI via Docker: docker compose run --rm summarizer python -m summarizer --source "URL"

Cobalt standalone: docker compose -f docker-compose.cobalt.yml up -d

Configuration

Providers (`summarizer.yaml`)

Define your LLM providers and defaults. CLI flags override everything.

default_provider: gemini

providers:
  gemini:
    base_url: https://generativelanguage.googleapis.com/v1beta/openai
    model: gemini-2.5-flash-lite
    chunk-size: 128000

  groq:
    base_url: https://api.groq.com/openai/v1
    model: openai/gpt-oss-20b

  ollama:
    base_url: http://localhost:11434/v1
    model: qwen3:8b

  openrouter:
    base_url: https://openrouter.ai/api/v1
    model: google/gemini-2.0-flash-001

defaults:
  prompt-type: Questions and answers
  chunk-size: 10000
  parallel-calls: 30
  max-tokens: 4096
  audio-speed: 1.0
  use-proxy: false
  output-dir: summaries

API Keys (`.env`)

# Required for Cloud Whisper transcription
groq = gsk_YOUR_KEY

# LLM providers (choose one or more)
openai = sk-proj-YOUR_KEY
generativelanguage = YOUR_GOOGLE_KEY
deepseek = YOUR_DEEPSEEK_KEY
openrouter = YOUR_OPENROUTER_KEY
perplexity = YOUR_PERPLEXITY_KEY
hyperbolic = YOUR_HYPERBOLIC_KEY

# Optional: Webshare credentials
# Used only when `defaults.use-proxy: true` or `--use-proxy` is enabled
WEBSHARE_PROXY_USERNAME = YOUR_WEBSHARE_USERNAME
WEBSHARE_PROXY_PASSWORD = YOUR_WEBSHARE_PASSWORD

If you pass an endpoint URL with --base-url, the API key is matched from .env by URL keyword. For example, https://generativelanguage.googleapis.com/... matches generativelanguage.

Prompts (`prompts.json`)

Use with --prompt-type in the CLI or select it from the dropdown in the web interface. Add custom styles by editing prompts.json. Use {text} as the transcript placeholder.

CLI Examples

With a configured summarizer.yaml, the CLI is simple:

# Uses the default provider from YAML
python -m summarizer --source "https://youtube.com/watch?v=VIDEO_ID"

# Specify a provider
python -m summarizer --source "https://youtube.com/watch?v=VIDEO_ID" --provider groq

# Fact-check claims with Perplexity (use the Summarize skill for AI agents)
python -m summarizer \
  --source "https://youtube.com/watch?v=VIDEO_ID" \
  --base-url "https://api.perplexity.ai" \
  --model "sonar-pro" \
  --prompt-type "Fact Checker"

# Extract key insights
python -m summarizer \
  --source "https://youtube.com/watch?v=VIDEO_ID" \
  --provider gemini \
  --prompt-type "Distill Wisdom"

# Generate a Mermaid diagram
python -m summarizer \
  --source "https://youtube.com/watch?v=VIDEO_ID" \
  --provider openrouter \
  --prompt-type "Mermaid Diagram"

# Multiple videos
python -m summarizer --source "URL1" "URL2" "URL3"

# Local files
python -m summarizer --type "Local File" --source "./lecture.mp4"

# Speed up audio before Whisper (faster, may reduce accuracy)
python -m summarizer --source "URL" --force-download --audio-speed 2.0

# Aggressive speed-up (supported)
python -m summarizer --source "URL" --force-download --audio-speed 5.0

# Force YouTube audio download and show detailed progress
python -m summarizer \
  --source "https://youtube.com/watch?v=VIDEO_ID" \
  --force-download \
  -v

# Non-YouTube URL (requires Cobalt)
python -m summarizer --type "Video URL" --source "https://www.instagram.com/reel/..."

# Let captions/transcription choose the language automatically (default)
python -m summarizer --source "URL" --language "auto"

# Lock YouTube captions or transcription to a specific language
python -m summarizer --source "URL" --prompt-type "Distill Wisdom" --language "it"

Without YAML, pass --base-url and --model explicitly:

python -m summarizer \
  --source "https://youtube.com/watch?v=VIDEO_ID" \
  --base-url "https://generativelanguage.googleapis.com/v1beta/openai" \
  --model "gemini-2.5-flash-lite"

CLI Reference

Flag	Description	Default
`--source`	Video URLs or file paths (multiple allowed)	Required
`--provider`	Provider name from YAML	`default_provider`
`--base-url`	API endpoint (overrides provider)	From YAML
`--model`	Model identifier (overrides provider)	From YAML
`--api-key`	API key (overrides `.env`)	-
`--type`	`YouTube Video`, `Video URL`, `Local File`, `Google Drive Video Link`, `Dropbox Video Link`, `TXT`	`YouTube Video`
`--prompt-type`	Summary style	`Questions and answers`
`--chunk-size`	Input text chunk size in characters	`10000`
`--force-download`	Skip captions and download audio instead	`False`
`--transcription`	`Cloud Whisper` (Groq API) or `Local Whisper` (local)	`Cloud Whisper`
`--whisper-model`	`tiny`, `base`, `small`, `medium`, `large`	`tiny`
`--audio-speed`	Pre-transcription playback speed	`1.0`
`--language`	`auto` picks the first available YouTube caption track and lets Whisper detect language; explicit codes stay strict	`auto`
`--parallel-calls`	Concurrent API requests	`30`
`--max-tokens`	Max output tokens per chunk	`4096`
`--cobalt-url`	Cobalt base URL for non-YouTube platforms and fallback downloads	`http://localhost:9000`
`--output-dir`	Output directory	`summaries`
`--no-save`	Print only, no file output	`False`
`--verbose`, `-v`	Detailed output	`False`

Use --verbose to see detailed status output during config loading, downloads, transcription, and summarization.

Extra

Local Whisper

Runs transcription on your machine instead of using Groq Cloud Whisper. This removes the Groq API requirement, but CPU-only runs are much slower.

# Add Local Whisper support
pip install -e .[whisper]

# Optional: install CUDA-enabled PyTorch for GPU acceleration
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

# Use it
python -m summarizer --source "URL" --force-download --transcription "Local Whisper" --whisper-model "small"

If you only need CPU transcription, pip install -e .[whisper] is enough.

Why not in Docker? The Docker image installs the core app only. It does not include openai-whisper or GPU-oriented PyTorch because this project targets lightweight VPS deployments, where GPUs are usually unavailable. In Docker, Cloud Whisper is the practical default. Use Local Whisper on the host machine if you have the hardware for it.

Model sizes: tiny (fastest) / base / small / medium / large (most accurate). GPU detection is automatic when PyTorch can see a CUDA device.

Proxy Setup

Proxy support matters in two separate places:

The Python app, when fetching YouTube transcripts or downloading YouTube audio with pytubefix
The Cobalt container, when it connects to upstream CDNs/providers

For the Python app, this repo expects Webshare credentials:

Add credentials to .env:

WEBSHARE_PROXY_USERNAME = YOUR_WEBSHARE_USERNAME
WEBSHARE_PROXY_PASSWORD = YOUR_WEBSHARE_PASSWORD

If you want pytubefix audio downloads to use that proxy, enable it in summarizer.yaml:

defaults:
  use-proxy: true

Notes:

YouTube transcript fetching uses Webshare automatically when those credentials are present.
defaults.use-proxy: true affects pytubefix audio downloads.

For the Cobalt container, the proxy is configured separately. That sits outside the Python app, but this repo includes a working example:

docker-compose.yml is the default full-stack setup
docker-compose.cobalt.yml runs only Cobalt
docker-compose.proxy.yml adds ./cobalt.proxy.env to the cobalt service
cobalt.proxy.env.example is the template
cobalt.proxy.env is your local, ignored secrets file

Docker examples:

# Cobalt only, no proxy
docker compose -f docker-compose.cobalt.yml up -d

# Cobalt only, with proxy
docker compose -f docker-compose.cobalt.yml -f docker-compose.proxy.yml up -d

# Full stack, no proxy
docker compose up -d

# Full stack, with proxy
docker compose -f docker-compose.yml -f docker-compose.proxy.yml up -d

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
.agent/skills/summarize		.agent/skills/summarize
.streamlit		.streamlit
summarizer		summarizer
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
cobalt.proxy.env.example		cobalt.proxy.env.example
docker-compose.cobalt.yml		docker-compose.cobalt.yml
docker-compose.proxy.yml		docker-compose.proxy.yml
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
requirements.txt		requirements.txt
setup.py		setup.py
summarizer.yaml		summarizer.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Transcript Summarizer

Interfaces

How It Works

Installation and Usage

Streamlit GUI

Docker

Configuration

Providers (`summarizer.yaml`)

API Keys (`.env`)

Prompts (`prompts.json`)

CLI Examples

CLI Reference

Extra

Local Whisper

Proxy Setup

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Video Transcript Summarizer

Interfaces

How It Works

Installation and Usage

Streamlit GUI

Docker

Configuration

Providers (summarizer.yaml)

API Keys (.env)

Prompts (prompts.json)

CLI Examples

CLI Reference

Extra

Local Whisper

Proxy Setup

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

Providers (`summarizer.yaml`)

API Keys (`.env`)

Prompts (`prompts.json`)