Transform PDF presentations into cinematic narrated videos with AI-generated scripts, browser-based TTS, and local rendering.
- Overview
- Why Origami?
- Key Features
- Getting Started
- Requirements
- How It Works
- Configuration
- Project Backup and Restore
- Troubleshooting
- Tech Stack
- Notes
- Support
- Credits
Origami AI is a web application that converts static PDF presentations into polished video content with AI-generated narration, background music, and transitions. Processing happens locally in your browser using WebGPU-accelerated models and FFmpeg.wasm.
Traditional video creation from presentations is often a choice between tedious manual labor or expensive AI subscriptions. Origami AI offers a third way: a fully automated, local-first studio that lives in your browser.
- 🎬 Static to Cinematic: Don't just show slides; tell a story. Origami automatically extracts context from your PDFs and crafts a narrative script that flows naturally.
- 🔒 Privacy First (Local-Only): Your data stays on your machine. By leveraging WebGPU and WebLLM, your scripts and audio are generated locally without ever sending sensitive presentation data to a third-party server.
- 🎙️ The "No-Mic" Solution: Perfect for creators who prefer not to use their own voice. With integrated Kokoro.js TTS, you get high-quality, human-like narration without needing a recording studio.
- ⚙️ Zero Infrastructure: No complex Python environments or CUDA drivers to wrestle with. If you have a modern browser, you have a professional-grade video editor.
- 💸 Cost Effective: Avoid "per-minute" AI generation fees. Use your own hardware to run inference and rendering for free.
| Feature | Traditional Editors | Cloud AI Video Tools | Origami AI |
|---|---|---|---|
| Effort | High (Manual) | Low | Minimal (Automated) |
| Privacy | Local | Cloud-Based (Risk) | Local-First |
| Cost | One-time / Free | Monthly Subscription | Free & Open Source |
| Voice | Your own / Pro Talent | Credits-based TTS | Unlimited Local TTS |
- Drag-and-drop PDF upload
- Automatic text extraction from each slide with PDF.js
- High-resolution image conversion (2x scale)
- Local AI processing with MLC-WebLLM
- Remote API support with OpenAI-compatible providers
- Customizable prompts for script behavior
- Multiple voices (af_heart, af_bella, am_adam, and more)
- Browser TTS via Kokoro.js
- Remote TTS support
- Automatic audio duration calculation for timing
- Drag-and-drop slide ordering
- Per-slide script editing with highlighting
- Transitions: fade, slide, wipe, blur, zoom
- Background music with volume and auto-ducking
- Per-slide or full-project audio generation
- Analyze uploaded Slide Media MP4 clips into timestamped scenes using Gemini
- Produces structured scene plans: step number, start timestamp, on-screen action, narration text, and duration
- Adds a full-screen Scene Alignment Editor for timeline-locked scene review and editing
- Supports per-scene TTS generation and full scene-batch TTS generation
- Automatically stretches the effective timeline when narration audio exceeds scene duration
- Stores raw Gemini JSON output for debugging
- Browser rendering using FFmpeg.wasm
- 720p and 1080p export
- Real-time progress tracking
- Render cancellation support
Visit https://origami.techmitten.com. No install required.
- Clone the repository:
git clone https://github.com/IslandApps/Origami-AI.git
cd Origami-AI- Install dependencies (Node.js >= 20.19.0):
npm install- Start development server:
npm run devOpen http://localhost:3000.
The development server is required because it sets
Cross-Origin-Opener-PolicyandCross-Origin-Embedder-Policy, which FFmpeg.wasm and SharedArrayBuffer need. Openingindex.htmldirectly will not work.
- Build production assets:
npm run build- Preview production build:
npm run previewContainerized deployment is supported via the included Docker files:
docker compose up --buildApp URL: http://localhost:3000.
npm run dev- Start Express + Vite development server with HMRnpm run build- Create production buildnpm run preview- Preview production build locallynpm run lint- Run lint checks
- Node.js >= 20.19.0
- WebGPU-compatible browser for local AI inference
- Stable internet connection for first-time model downloads
- Docker Desktop or Docker Engine (optional, for container deployment)
| Browser | Minimum Version |
|---|---|
| Chrome / Chromium | 113+ |
| Microsoft Edge | 113+ |
| Firefox | Nightly (enable dom.webgpu.enabled) |
| Safari | 18+ (macOS Sonoma) |
If WebGPU is unavailable, you can still use remote OpenAI-compatible APIs from Settings.
Minimum
- 4-core CPU
- 8GB RAM
- Integrated GPU with WebGPU support
Recommended
- 8-core CPU
- 16GB RAM
- Dedicated GPU with WebGPU support
- SSD for faster model/model-cache operations
- Upload a PDF.
- Extract text and convert pages to slide images.
- Generate narration scripts with AI.
- Generate speech audio from scripts.
- Edit scripts, voice, timing, transitions, and music.
- Render final MP4 with FFmpeg.wasm.
- Download the video.
Settings are grouped under General, API, TTS Model, WebLLM, and AI Prompt.
- Enable Global Defaults for new uploads
- Intro Fade In and Intro Fade Length (seconds)
- Post-Audio Delay (seconds)
- Audio Normalization toggle
- Recording Countdown toggle
- Default Transition (Fade, Slide, Zoom, None)
- Default Music upload and volume
- TTS quantization selection:
q4orq8
- Enable/disable local WebLLM
- Select model to load
- Precision filter (f16, f32, all)
- Configure Base URL and API Key for OpenAI-compatible providers (Gemini, OpenRouter, Ollama, etc.)
- Fetch models from provider
- Customize Script Fix System Prompt
The slide editor includes five tabs:
- Overview
- Script edit/focus modes
- AI Fix Script
- Copy/Revert, preview, select/delete, reorder, list/grid
- Voice Settings
- Global voice preview and apply-all
- Per-slide voice, TTS generation/regeneration, voice recording
- Per-slide delay and apply-all delay
- Audio Mixing
- Default music and volume
- Per-slide music playback, seek, loop, visualizer
- Video music toggle for video slides
- Batch Tools
- Generate All Audio, Fix All Scripts, Revert All Scripts, Find & Replace
- Batch progress/cancel support
- Slide Media
- Replace slide image/media (PDF/JPG/PNG)
- Upload MP4/GIF slides (duration auto-detected)
- Media preview and duration-aware export behavior
- Analyze Video (silent MP4 only) to generate editable scene narration plans
- Open Scene Alignment Editor to edit timestamps, durations, and narration per scene
- Generate TTS per scene or all scenes with timeline stretch recalculation
AI actions require either a configured API provider or a loaded WebLLM model.
Use this workflow after uploading a Slide Media video when you want scene-aware narration.
- Upload a video slide as MP4 in Slide Media.
- Click Analyze Video on that slide.
- Wait for progress stages (upload, processing, JSON generation, parsing).
- Open Edit Scenes to review in the full-screen Scene Alignment Editor.
- Adjust scene timestamps (
MM:SS), durations, and narration text. - Generate scene TTS (single scene or all scenes).
- Render MP4 normally; slide timeline uses the effective stretched duration.
- Requires a configured Gemini API key in Settings.
- Video file analysis requires Google Gemini base URL (
https://generativelanguage.googleapis.com/v1beta/openai/). - Analyze Video only supports Slide Media silent MP4 uploads.
- GIF/image media is not supported for analysis.
- MP4 files with embedded audio tracks are rejected for this workflow.
- If model output JSON is malformed, Origami automatically retries with a repair prompt.
If WebGPU is unavailable:
- Enable hardware acceleration in browser settings.
- Update browser to latest version.
- In Firefox Nightly, enable
dom.webgpu.enabled.
Use .origami archives from the Actions menu to move projects between devices.
- Export Project: Saves slides, media/audio blobs, music settings, and project metadata.
- Import Project: Validates archive and replaces the current project.
Notes:
- Import is strict by archive format version.
- Global defaults in Settings are not changed by project import/export.
- WebGPU not detected: Enable hardware acceleration, update GPU drivers, and use a supported browser.
- Dev server or FFmpeg.wasm errors: Start via
npm run dev; do not openindex.htmldirectly. - SharedArrayBuffer / COOP/COEP warnings: Ensure responses include
Cross-Origin-Opener-Policy: same-originandCross-Origin-Embedder-Policy: credentialless. - Model download or TTS failures: Verify internet stability, clear site data, and check browser storage permissions.
- Out of memory during local inference: Use smaller/quantized models, close background apps, or switch to remote API.
- FFmpeg.wasm slow/high memory: Lower resolution, reduce project size, or run via Docker.
- Audio/video sync or export failures: Rebuild with
npm run build, then retry withnpm run preview. - Analyze Video fails or stays unavailable: Verify Gemini API key, Google Gemini base URL, and that the slide media is a silent MP4.
- Analyze Video rejects your MP4 for audio: Remove the clip audio track, then re-upload and analyze again.
- Docker issues: Confirm Docker is installed/running and has enough disk space/permissions.
Frontend
- React 19.2.0 with TypeScript
- Vite 7.2.4
- Tailwind CSS 4.1.18
- React Router DOM 7.13.0
Core Libraries
@mlc-ai/web-llmfor local LLM inference@ffmpeg/ffmpegand@ffmpeg/utilfor video renderingpdfjs-distfor PDF rendering and extractionkokoro-jsfor text-to-speech@dnd-kitfor drag-and-drop UI
Backend (Dev Server)
- Express.js 5.2.1
- TypeScript
- AI workflows can run locally in-browser; model downloads are cached after first use.
- First-time setup can take several minutes based on network speed.
- Rendering performance depends on available CPU/GPU/memory.
Report issues at: https://github.com/IslandApps/Origami-AI/issues
When reporting, include:
- Browser and version
- OS
- Node version (
node -v) - Reproduction steps
- Relevant console logs
- WebLLM: https://github.com/mlc-ai/web-llm
- Kokoro.js: https://github.com/Kokoro-js
- ffmpeg.wasm: https://github.com/ffmpegwasm/ffmpeg.wasm
