Automatically transcribe and split audio & video libraries into their individual stories.
Point StoryEngine at a folder of audio or video files — podcasts, YouTube downloads, lectures, interviews, recorded meetings, films — and it will:
- Transcribe every file with Whisper (local, private, no API costs)
- Detect stories — ask a local LLM to identify where topics change, producing a titled, summarised story for each segment
- Present results in a web UI so you can browse your entire library by story rather than by file
- Split clips — cut each story into its own file, losslessly, using ffmpeg
- Find duplicates — identify when the same topic appears across many different files
| Type | Formats |
|---|---|
| Audio | .mp3 .m4a .opus .flac .wav |
| Video | .mp4 .mkv .webm .avi .mov |
Mix audio and video freely in the same library folder. Both are processed identically.
| Feature | Description |
|---|---|
| Media browser | Browse all processed files, filter by status, search by title |
| Story detection | LLM identifies topic boundaries, generates titles and summaries |
| In-browser player | Stream the original file with a click-to-seek story timeline |
| Transcript search | Full-text search across every transcript in your library |
| Clip splitting | Lossless ffmpeg -c copy cuts — no re-encoding, instant |
| Story editing | Correct LLM-generated titles, summaries, and timestamps |
| Thumbnails | JPEG frame extracted at each story's midpoint |
| SRT subtitles | Download subtitles for each clip |
| NFO metadata | Jellyfin/Kodi compatible <episodedetails> export |
| Sponsor detection | SponsorBlock (YouTube) + LLM-based detection for any file |
| Deduplication | Semantic embeddings + HNSW index to find repeated stories |
| Channel reports | Per-folder breakdowns with scoped dedup analysis |
| Playlist export | M3U8 (VLC, mpv) and JSON playlists |
| Bulk ZIP download | Download multiple clips as a ZIP archive |
| YouTube upload | OAuth2 clip upload with automatic playlist management |
| Webhooks | HTTP notifications with HMAC-SHA256 signing |
| Batch reprocess | Multi-select to re-run the pipeline on many files at once |
| File removal | Remove a file and all its stories/jobs from the database |
| Settings reset | Reset any individual setting back to its environment/default value |
- Docker and Docker Compose (Docker Desktop, OrbStack, or any compatible runtime)
- Ollama running on your local network with at least one LLM model pulled
- Story detection:
llama3.1:8bor any capable model (ollama pull llama3.1:8b) - Embeddings (for dedup):
nomic-embed-text(ollama pull nomic-embed-text)
- Story detection:
- A folder of audio and/or video files
No GPU required. CPU works fine. A GPU makes Whisper transcription significantly faster but is entirely optional.
git clone https://github.com/wittedinit/StoryEngine.git
cd StoryEngineEdit docker-compose.yml and add a volume mount to both the backend and worker services:
services:
backend:
volumes:
- /your/host/media:/media:ro # :ro = read-only, StoryEngine never writes here
worker:
volumes:
- /your/host/media:/media:roUsing UYTDownloader? Mount its downloads volume directly:
volumes: - uytdownloader_downloads:/media:ro
docker compose up -dFirst boot downloads images and may take a minute. Six containers will start: PostgreSQL, Redis, Ollama, backend API, Celery worker, and the Next.js frontend.
Visit http://localhost:3100 — a setup wizard will ask for two settings:
- Ollama URL — e.g.
http://192.168.1.60:11434(orhttp://ollama:11434if using the bundled container). Click Test Connection to verify and pick a model. - Media Library Path — the path inside the container where your files are mounted, e.g.
/media
Once both are saved, StoryEngine scans your library automatically every 5 minutes. Trigger an immediate scan from the Dashboard.
docker compose -f docker-compose.yml -f docker-compose.gpu.yml up -dPasses all NVIDIA GPUs to both the worker (Whisper/CUDA) and Ollama containers.
Docker Desktop cannot expose Apple Metal to containers. Run the GPU worker natively:
cd backend
pip install ".[worker]"
SE_WHISPER_DEVICE=metal celery -A app.celery_app:celery worker -Q gpu --concurrency=1In Settings → Transcription, set Compute Device:
auto— detects CUDA → Metal → CPU automatically (default)cuda— force NVIDIA GPUmetal— force Apple GPUcpu— force CPU
Changing this setting requires a worker restart.
| Service | Default | Change via |
|---|---|---|
| Web UI | http://localhost:3100 |
SE_FRONTEND_PORT in .env |
| API | http://localhost:8100 |
SE_PORT in .env |
| API docs (OpenAPI) | http://localhost:8100/docs |
— |
These are intentionally offset from UYTDownloader's 3000/8000 so both can run side by side.
To change ports: copy .env.default to .env, set the values, then restart.
StoryEngine infers a channel name from the immediate parent folder of each file. This is how podcast feeds, YouTube channels, or series are grouped automatically:
| File path | Channel |
|---|---|
/media/Lex Fridman/ep123.mp4 |
Lex Fridman |
/media/My Podcast/S01E04.mp3 |
My Podcast |
/media/lecture.mp4 |
(none — root level) |
Channel statistics appear in the Reports section.
These are set in .env (copy from .env.default) or passed directly to Docker Compose. They override the built-in defaults but can themselves be overridden by values saved in Settings.
| Variable | Default | Description |
|---|---|---|
SE_PORT |
8100 |
Host port for the API |
SE_FRONTEND_PORT |
3100 |
Host port for the web UI |
SE_WORKER_CONCURRENCY |
2 |
Number of parallel workers for scan/pipeline/llm queues |
TZ |
UTC |
Timezone for all services (e.g. Europe/London, America/New_York) |
SE_WORK_DIR |
/work |
Container path for temporary processing files |
SE_DATA_DIR |
/data |
Container path for persistent data |
All settings are live-editable in the Settings page (no restart required unless noted).
Priority order: value saved in Settings UI > environment variable > built-in default. To reset a setting back to its environment/default value, use the reset button in the Settings UI or
DELETE /api/v1/settings/{key}.
| Setting | Default | Description |
|---|---|---|
| Ollama Endpoint | (empty) | URL of your Ollama instance |
| LLM Model | llama3.1:8b |
Model for story detection — click Test Connection to pick from your available models |
| Embed Model | nomic-embed-text |
Model for semantic embeddings (dedup). Pull it first: ollama pull nomic-embed-text |
| Setting | Default | Description |
|---|---|---|
| Whisper Model | base |
tiny / base / small / medium / large-v3 / distil-large-v3. Larger = more accurate but slower. |
| Compute Device | auto |
auto (CUDA → Metal → CPU), cuda, metal, cpu |
| Compute Precision | auto |
float16 (GPU), int8 (CPU), float32 (safe fallback) |
Changing Whisper settings requires a worker restart to reload the model.
| Setting | Default | Description |
|---|---|---|
| Scan Interval | 300 |
Seconds between automatic library scans |
| Auto-Split Clips | false |
Automatically split every story into a clip after detection |
| Auto-Embed Stories | false |
Automatically generate embeddings after detection (enables automatic dedup) |
| Sponsor Detection | disabled |
sponsorblock (YouTube files only), llm (any file), both, or disabled |
| Sponsor Action | mark |
mark (tag only), skip (exclude from splits), split_out (save as separate files) |
| Dedup Threshold | 0.85 |
Cosine similarity threshold for duplicate detection (0–1) |
| Setting | Default | Description |
|---|---|---|
| Media Library Path | /data/downloads |
Where StoryEngine reads your audio/video files from (read-only) |
| Output Directory | /segments |
Where split clip files are saved (must be writable) |
| Setting | Default | Description |
|---|---|---|
| Google OAuth Client ID | (empty) | From Google Cloud Console |
| Google OAuth Client Secret | (empty) | From Google Cloud Console |
| Default Privacy | private |
public, unlisted, or private |
| Playlist Mode | per_video |
per_video (one playlist per source file), per_channel, or none |
| Auto-Upload After Split | false |
Automatically upload newly split clips to YouTube |
- In Google Cloud Console: create a project, enable the YouTube Data API v3, and create OAuth 2.0 credentials (type: Web Application)
- Add this exact URL as an authorised redirect URI:
http://localhost:8100/api/v1/youtube/oauth/callback - In Settings → YouTube: enter your Client ID and Client Secret, then save
- On the YouTube page: click Connect YouTube — you'll be redirected to Google to grant access, then sent back automatically
YouTube upload is purely additive — original files and local clips are never deleted. You control your local library and YouTube channel independently.
SponsorBlock (YouTube files only)
- Reads the YouTube video ID from the filename (e.g.
My Video [dQw4w9WgXcQ].mp4) - Queries the public SponsorBlock API for crowdsourced timestamps
- Categories:
sponsor,selfpromo,interaction,intro,outro,preview,filler
LLM detection (any audio or video file)
- Sends the transcript to your LLM and asks it to identify promotional language
- Works for podcasts, radio recordings, and any file without a YouTube ID
Sponsor actions:
mark— yellow bars on the timeline, tagged in lists, no files changedskip— excluded when auto-splitting (clips contain only story content)split_out— saved as separate clip files insegments/sponsors/
Configure HTTP notifications on the Webhooks page. Supported events:
| Event | When |
|---|---|
job_completed |
Pipeline completes successfully for a file |
job_failed |
A pipeline stage fails |
story_detected |
Story detection finishes (includes story count) |
thumbnail_generated |
Thumbnail generated for a story |
youtube_uploaded |
Clip uploaded to YouTube |
All webhook calls POST JSON. Set a secret per webhook to enable HMAC-SHA256 request signing (X-StoryEngine-Signature: sha256=...).
Backend API at http://localhost:8100/api/v1. Full OpenAPI docs at http://localhost:8100/docs.
# Media files
GET /api/v1/videos List all files (paginated, filterable by status/search)
GET /api/v1/videos/{id} File detail + stream_url for in-browser player
GET /api/v1/videos/{id}/transcript Full transcript with timestamped segments
DELETE /api/v1/videos/{id} Remove a file and its stories/jobs from the database
# Stories
GET /api/v1/stories List all stories (paginated, searchable)
GET /api/v1/stories/{id} Story detail with transcript excerpt
PATCH /api/v1/stories/{id} Edit title, summary, or timestamps
# Pipeline
POST /api/v1/pipeline/scan Trigger an immediate library scan
POST /api/v1/pipeline/reprocess/{id} Re-run full pipeline on one file
POST /api/v1/pipeline/reprocess-batch Re-run pipeline on multiple file IDs
# Jobs
GET /api/v1/jobs List recent processing jobs (paginated)
# Clips & exports
POST /api/v1/export/stories/{id}/split Split one story into a clip
POST /api/v1/export/videos/{id}/split Split all stories in a file
GET /api/v1/export/stories/{id}/clip Download the clip file
POST /api/v1/export/stories/{id}/thumbnail Generate a thumbnail (video files only)
GET /api/v1/export/stories/{id}/thumbnail Download the thumbnail JPEG
GET /api/v1/export/stories/{id}/srt Download SRT subtitle file
GET /api/v1/export/stories/{id}/nfo Download NFO metadata file
GET /api/v1/export/videos/{id}/playlist Export M3U8 or JSON playlist for a file
GET /api/v1/export/stories/playlist?ids=… Export playlist for selected story IDs
POST /api/v1/export/zip Queue a bulk ZIP of clips
GET /api/v1/export/zip/{task_id}/status Poll ZIP build progress
GET /api/v1/export/zip/{task_id}/download Download completed ZIP
# Search
GET /api/v1/search/transcripts?q=… Full-text search across all transcripts
# Dedup
POST /api/v1/dedup/embed Embed all un-embedded stories
GET /api/v1/dedup/clusters Find duplicate story clusters
GET /api/v1/dedup/similar/{id} Stories similar to a given story
# Reports
GET /api/v1/reports/channels Channel list with aggregate stats
GET /api/v1/reports/channels/{name}/dedup Per-channel dedup report
GET /api/v1/reports/channels/{name}/videos Files in a channel
# Webhooks
GET /api/v1/webhooks List webhooks
POST /api/v1/webhooks Create webhook
PUT /api/v1/webhooks/{id} Update webhook
DELETE /api/v1/webhooks/{id} Delete webhook
POST /api/v1/webhooks/{id}/test Send a test call
# YouTube
GET /api/v1/youtube/status OAuth connection status
GET /api/v1/youtube/oauth/authorize Get Google OAuth redirect URL
GET /api/v1/youtube/oauth/callback OAuth2 redirect handler (set this as your redirect URI)
DELETE /api/v1/youtube/oauth/revoke Disconnect YouTube and clear stored tokens
POST /api/v1/youtube/upload/{story_id} Queue a clip for upload
POST /api/v1/youtube/upload-all Queue all un-uploaded clips
GET /api/v1/youtube/upload/{task_id}/status Check upload progress
# Settings & health
GET /api/v1/settings/setup Setup wizard status (validates Ollama + media path)
GET /api/v1/settings All settings with current values
PUT /api/v1/settings/{key} Update a setting
DELETE /api/v1/settings/{key} Reset a setting to its environment/default value
GET /api/v1/settings/ollama/models List models available on the connected Ollama instance
GET /api/v1/stats Dashboard statistics (file counts, story counts, queue depth)
GET /health Health check (DB, Redis, Ollama, ffmpeg)
StoryEngine/
├── backend/ # Python FastAPI + Celery
│ ├── app/
│ │ ├── api/ # REST endpoints
│ │ │ ├── videos.py # Media file list/detail + stream_url
│ │ │ ├── stories.py # Story list/detail/patch
│ │ │ ├── export.py # Clips, thumbnails, SRT, NFO, playlists, ZIP
│ │ │ ├── search.py # Full-text transcript search
│ │ │ ├── pipeline.py # Scan, reprocess, reprocess-batch
│ │ │ ├── reports.py # Channel reports + dedup
│ │ │ ├── webhooks.py # Webhook CRUD + test
│ │ │ ├── youtube.py # OAuth + upload
│ │ │ ├── dedup.py # Embedding + clusters
│ │ │ └── jobs.py # Processing queue
│ │ ├── models/ # SQLAlchemy ORM models
│ │ ├── schemas/ # Pydantic response schemas
│ │ └── services/ # scanner, transcriber, story_detector, splitter,
│ │ # embedder, dedup, sponsorblock, playlist,
│ │ # thumbnail, srt, nfo, webhook_fire, youtube_upload
│ └── alembic/versions/ # Database migrations
├── frontend/ # Next.js App Router + Tailwind
│ └── src/app/
│ ├── page.tsx # Dashboard + setup wizard
│ ├── videos/ # Media list + detail (player, timeline, stories)
│ ├── stories/ # Story list + detail (edit, thumbnail, upload)
│ ├── search/ # Full-text transcript search
│ ├── dedup/ # Duplicate cluster browser
│ ├── reports/ # Channel reports + CSV export
│ ├── youtube/ # YouTube OAuth + bulk upload
│ ├── webhooks/ # Webhook CRUD UI
│ ├── jobs/ # Processing queue viewer
│ ├── manual/ # Full in-app user manual
│ └── settings/ # Runtime settings editor
└── docker/ # Dockerfiles + entrypoints
| Component | Technology |
|---|---|
| Transcription | faster-whisper (CTranslate2) + Silero VAD |
| Story detection | Ollama REST API — local LLM, JSON mode, no cloud |
| Embeddings | Ollama (nomic-embed-text or any embed model) |
| Dedup index | USearch HNSW |
| Media splitting | ffmpeg -c copy — lossless, keyframe-snapped |
| Thumbnails | ffmpeg -frames:v 1 -q:v 2 at story midpoint |
| Sponsor detection | SponsorBlock API + optional LLM fallback |
| YouTube | Google YouTube Data API v3 + OAuth2 |
| Webhooks | httpx + HMAC-SHA256 signing |
| Full-text search | PostgreSQL tsvector / tsquery + GIN index |
| Backend | FastAPI + Pydantic + SQLAlchemy async |
| Task queue | Celery + Redis |
| Database | PostgreSQL 16 |
| Frontend | Next.js 15 App Router + Tailwind CSS |
| Deploy | Docker Compose |
| Service | Role |
|---|---|
backend |
FastAPI API server |
worker |
Celery worker (all queues) |
beat |
Celery Beat periodic task scheduler — triggers library scans on interval |
frontend |
Next.js web UI |
postgres |
PostgreSQL 16 database |
redis |
Redis message broker + result backend |
| Queue | Concurrency | What runs here |
|---|---|---|
scan, pipeline |
SE_WORKER_CONCURRENCY (default: 2) |
File scanning, audio extraction, clip splitting, thumbnails, ZIP, YouTube uploads |
gpu |
1 (always serialised) | Whisper transcription — serialised to prevent VRAM contention |
llm |
2 | Ollama story detection, sponsor detection, embeddings |
# Backend (requires postgres + redis running)
cd backend
pip install -e ".[server,worker]"
uvicorn app.main:app --reload --port 8100
# Frontend
cd frontend
npm install
npm run dev # http://localhost:3000
# Celery worker
celery -A app.celery_app:celery worker -Q scan,pipeline,gpu,llm --loglevel=info
# Run migrations
alembic upgrade head- v1.0 — Transcription + story detection + web UI
- v1.1 — Lossless clip splitting + semantic dedup + sponsor detection
- v1.2 — Playlist export (M3U8/JSON), bulk ZIP download, channel reports
- v1.3 — In-browser player, transcript search, story editing, thumbnails, SRT/NFO export, webhooks, YouTube integration, batch reprocess