GitHub - IslandApps/Origami-AI: Transforms static PDF documents into professional narrated presentations using advanced local AI.

Try the Hosted Demo | Watch the Full Video

Transform PDF presentations into cinematic narrated videos with AI-generated scripts, browser-based TTS, and local rendering.

Overview

Origami AI is a web application that converts static PDF presentations into polished video content with AI-generated narration, background music, and transitions. Processing happens locally in your browser using WebGPU-accelerated models and FFmpeg.wasm.

Why Origami?

Traditional video creation from presentations is often a choice between tedious manual labor or expensive AI subscriptions. Origami AI offers a third way: a fully automated, local-first studio that lives in your browser.

🎬 Static to Cinematic: Don't just show slides; tell a story. Origami automatically extracts context from your PDFs and crafts a narrative script that flows naturally.
🔒 Privacy First (Local-Only): Your data stays on your machine. By leveraging WebGPU and WebLLM, your scripts and audio are generated locally without ever sending sensitive presentation data to a third-party server.
🎙️ The "No-Mic" Solution: Perfect for creators who prefer not to use their own voice. With integrated Kokoro.js TTS, you get high-quality, human-like narration without needing a recording studio.
⚙️ Zero Infrastructure: No complex Python environments or CUDA drivers to wrestle with. If you have a modern browser, you have a professional-grade video editor.
💸 Cost Effective: Avoid "per-minute" AI generation fees. Use your own hardware to run inference and rendering for free.

Feature	Traditional Editors	Cloud AI Video Tools	Origami AI
Effort	High (Manual)	Low	Minimal (Automated)
Privacy	Local	Cloud-Based (Risk)	Local-First
Cost	One-time / Free	Monthly Subscription	Free & Open Source
Voice	Your own / Pro Talent	Credits-based TTS	Unlimited Local TTS

Key Features

PDF Processing

Drag-and-drop PDF upload
Automatic text extraction from each slide with PDF.js
High-resolution image conversion (2x scale)

AI-Powered Narration

Local AI processing with MLC-WebLLM
Remote API support with OpenAI-compatible providers
Customizable prompts for script behavior

Text-to-Speech

Multiple voices (af_heart, af_bella, am_adam, and more)
Browser TTS via Kokoro.js
Remote TTS support
Automatic audio duration calculation for timing

Video Editor

Drag-and-drop slide ordering
Per-slide script editing with highlighting
Transitions: fade, slide, wipe, blur, zoom
Background music with volume and auto-ducking
Per-slide or full-project audio generation

Analyze Video and Scene Alignment

Analyze uploaded Slide Media MP4 clips into timestamped scenes using Gemini
Produces structured scene plans: step number, start timestamp, on-screen action, narration text, and duration
Adds a full-screen Scene Alignment Editor for timeline-locked scene review and editing
Supports per-scene TTS generation and full scene-batch TTS generation
Automatically stretches the effective timeline when narration audio exceeds scene duration
Stores raw Gemini JSON output for debugging

Video Rendering

Browser rendering using FFmpeg.wasm
720p and 1080p export
Real-time progress tracking
Render cancellation support

Getting Started

Option A - Hosted (No Setup)

Visit https://origami.techmitten.com. No install required.

Option B - Run Locally

Clone the repository:

git clone https://github.com/IslandApps/Origami-AI.git
cd Origami-AI

Install dependencies (Node.js >= 20.19.0):

npm install

Start development server:

npm run dev

Open http://localhost:3000.

The development server is required because it sets Cross-Origin-Opener-Policy and Cross-Origin-Embedder-Policy, which FFmpeg.wasm and SharedArrayBuffer need. Opening index.html directly will not work.

Build production assets:

npm run build

Preview production build:

npm run preview

Option C - Docker

Containerized deployment is supported via the included Docker files:

docker compose up --build

App URL: http://localhost:3000.

Available Scripts

npm run dev - Start Express + Vite development server with HMR
npm run build - Create production build
npm run preview - Preview production build locally
npm run lint - Run lint checks

Requirements

Prerequisites

Node.js >= 20.19.0
WebGPU-compatible browser for local AI inference
Stable internet connection for first-time model downloads
Docker Desktop or Docker Engine (optional, for container deployment)

Browser Compatibility

Browser	Minimum Version
Chrome / Chromium	113+
Microsoft Edge	113+
Firefox	Nightly (enable `dom.webgpu.enabled`)
Safari	18+ (macOS Sonoma)

If WebGPU is unavailable, you can still use remote OpenAI-compatible APIs from Settings.

System Requirements

Minimum

4-core CPU
8GB RAM
Integrated GPU with WebGPU support

Recommended

8-core CPU
16GB RAM
Dedicated GPU with WebGPU support
SSD for faster model/model-cache operations

How It Works

Upload a PDF.
Extract text and convert pages to slide images.
Generate narration scripts with AI.
Generate speech audio from scripts.
Edit scripts, voice, timing, transitions, and music.
Render final MP4 with FFmpeg.wasm.
Download the video.

Configuration

Settings

Settings are grouped under General, API, TTS Model, WebLLM, and AI Prompt.

General

Enable Global Defaults for new uploads
Intro Fade In and Intro Fade Length (seconds)
Post-Audio Delay (seconds)
Audio Normalization toggle
Recording Countdown toggle
Default Transition (Fade, Slide, Zoom, None)
Default Music upload and volume

TTS Model

TTS quantization selection: q4 or q8

WebLLM

Enable/disable local WebLLM
Select model to load
Precision filter (f16, f32, all)

API

Configure Base URL and API Key for OpenAI-compatible providers (Gemini, OpenRouter, Ollama, etc.)
Fetch models from provider

AI Prompt

Customize Script Fix System Prompt

Configure Slides (In-App)

The slide editor includes five tabs:

Overview
- Script edit/focus modes
- AI Fix Script
- Copy/Revert, preview, select/delete, reorder, list/grid
Voice Settings
- Global voice preview and apply-all
- Per-slide voice, TTS generation/regeneration, voice recording
- Per-slide delay and apply-all delay
Audio Mixing
- Default music and volume
- Per-slide music playback, seek, loop, visualizer
- Video music toggle for video slides
Batch Tools
- Generate All Audio, Fix All Scripts, Revert All Scripts, Find & Replace
- Batch progress/cancel support
Slide Media
- Replace slide image/media (PDF/JPG/PNG)
- Upload MP4/GIF slides (duration auto-detected)
- Media preview and duration-aware export behavior
- Analyze Video (silent MP4 only) to generate editable scene narration plans
- Open Scene Alignment Editor to edit timestamps, durations, and narration per scene
- Generate TTS per scene or all scenes with timeline stretch recalculation

AI actions require either a configured API provider or a loaded WebLLM model.

Analyze Video Workflow (In-App)

Use this workflow after uploading a Slide Media video when you want scene-aware narration.

Upload a video slide as MP4 in Slide Media.
Click Analyze Video on that slide.
Wait for progress stages (upload, processing, JSON generation, parsing).
Open Edit Scenes to review in the full-screen Scene Alignment Editor.
Adjust scene timestamps (MM:SS), durations, and narration text.
Generate scene TTS (single scene or all scenes).
Render MP4 normally; slide timeline uses the effective stretched duration.

Analyze Video Requirements and Limits

Requires a configured Gemini API key in Settings.
Video file analysis requires Google Gemini base URL (https://generativelanguage.googleapis.com/v1beta/openai/).
Analyze Video only supports Slide Media silent MP4 uploads.
GIF/image media is not supported for analysis.
MP4 files with embedded audio tracks are rejected for this workflow.
If model output JSON is malformed, Origami automatically retries with a repair prompt.

WebGPU Setup

If WebGPU is unavailable:

Enable hardware acceleration in browser settings.
Update browser to latest version.
In Firefox Nightly, enable dom.webgpu.enabled.

Project Backup and Restore

Use .origami archives from the Actions menu to move projects between devices.

Export Project: Saves slides, media/audio blobs, music settings, and project metadata.
Import Project: Validates archive and replaces the current project.

Notes:

Import is strict by archive format version.
Global defaults in Settings are not changed by project import/export.

Troubleshooting

WebGPU not detected: Enable hardware acceleration, update GPU drivers, and use a supported browser.
Dev server or FFmpeg.wasm errors: Start via npm run dev; do not open index.html directly.
SharedArrayBuffer / COOP/COEP warnings: Ensure responses include Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: credentialless.
Model download or TTS failures: Verify internet stability, clear site data, and check browser storage permissions.
Out of memory during local inference: Use smaller/quantized models, close background apps, or switch to remote API.
FFmpeg.wasm slow/high memory: Lower resolution, reduce project size, or run via Docker.
Audio/video sync or export failures: Rebuild with npm run build, then retry with npm run preview.
Analyze Video fails or stays unavailable: Verify Gemini API key, Google Gemini base URL, and that the slide media is a silent MP4.
Analyze Video rejects your MP4 for audio: Remove the clip audio track, then re-upload and analyze again.
Docker issues: Confirm Docker is installed/running and has enough disk space/permissions.

Tech Stack

Frontend

React 19.2.0 with TypeScript
Vite 7.2.4
Tailwind CSS 4.1.18
React Router DOM 7.13.0

Core Libraries

@mlc-ai/web-llm for local LLM inference
@ffmpeg/ffmpeg and @ffmpeg/util for video rendering
pdfjs-dist for PDF rendering and extraction
kokoro-js for text-to-speech
@dnd-kit for drag-and-drop UI

Backend (Dev Server)

Express.js 5.2.1
TypeScript

Notes

AI workflows can run locally in-browser; model downloads are cached after first use.
First-time setup can take several minutes based on network speed.
Rendering performance depends on available CPU/GPU/memory.

Support

Report issues at: https://github.com/IslandApps/Origami-AI/issues

When reporting, include:

Browser and version
OS
Node version (node -v)
Reproduction steps
Relevant console logs

Credits

WebLLM: https://github.com/mlc-ai/web-llm
Kokoro.js: https://github.com/Kokoro-js
ffmpeg.wasm: https://github.com/ffmpegwasm/ffmpeg.wasm

Name		Name	Last commit message	Last commit date
Latest commit History 291 Commits
public		public
screenshots		screenshots
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.md		config.md
docker-compose.yml		docker-compose.yml
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
server.ts		server.ts
test.html		test.html
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
tsconfig.server.json		tsconfig.server.json
vite.config.ts		vite.config.ts

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Overview

Why Origami?

Key Features

PDF Processing

AI-Powered Narration

Text-to-Speech

Video Editor

Analyze Video and Scene Alignment

Video Rendering

Getting Started

Option A - Hosted (No Setup)

Option B - Run Locally

Option C - Docker

Available Scripts

Requirements

Prerequisites

Browser Compatibility

System Requirements

How It Works

Configuration

Settings

General

TTS Model

WebLLM

API

AI Prompt

Configure Slides (In-App)

Analyze Video Workflow (In-App)

Analyze Video Requirements and Limits

WebGPU Setup

Project Backup and Restore

Troubleshooting

Tech Stack

Notes

Support

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages