FrameFlow is a high-fidelity multimedia platform that bridges the gap between raw video/image assets and creative intelligence. By fusing Google Gemini's multimodal brain with precise FFmpeg engineering, FrameFlow transforms how you consume, extract, and generate media.
FrameFlow is built on three core intelligence layers, designed for creators, researchers, and developers.
Transform long-form content into concise, meaningful highlights.
- Academic Precision: Condense 2-hour technical lectures into 5-minute study guides.
- Meeting Recap: Rapidly navigate long webinars for specific insights.
- Narrative Awareness: AI understands scene transitions and audio context simultaneously.
Extract and generate high-fidelity visual assets from any video source.
- Auto-Enrichment: AI analyzes scene quality to extract the most representative frames.
- Professional Thumbnails: Generate YouTube-ready or presentation-grade thumbnails with AI-driven composition.
- Batch Processing: Extract hundreds of scene-indexed images in seconds.
Leverage multimodal prompts to transform existing images or generate new ones from scratch.
- Visual Continuity: Use existing frames as structural references for new generations.
- Prompt-Driven Flow: Refine images using natural language within a unified chat-graph interface.
- Multimodal Fusion: Combine video context with external image uploads for hybrid creativity.
FrameFlow handles a wide range of media formats and sources:
- Video Formats: Native support for
.mp4,.avi,.mov, and.webm. - Online Sources: YouTube, Google Drive, and direct media URLs (via
yt-dlp). - Images: High-fidelity
.jpg,.png, and.webpfor structural reference and multimodal generation. - Optimization: High-res videos are automatically downscaled (480p) to ensure lightning-fast AI analysis without losing metadata.
FrameFlow isn't just a tool; it's an iterative workspace:
- Vue Flow Graph Interface: Manage parallel tasks and version branches visually.
- Live Metrics: Monitor AI token usage and processing costs in real-time.
- Zero-Config Preprocessing: Automatic scene detection and transcript extraction.
- Ambient Design: A sleek, dark-mode-first interface with glassmorphism and smooth animations.
| Section | Link | Purpose |
|---|---|---|
| 🏗 Architecture | Deep-Dive | Pipeline logic, intent nodes, and iterative generation. |
| 🚀 Installation | Setup Guide | Node.js, Gemini API, FFmpeg, and yt-dlp setup. |
| 🎨 UI/UX | Design Overview | Frontend components and interaction flow. |
FrameFlow is licensed under the MIT License. Created by navidshad and his classmates as part of a high-fidelity AI engineering initiative at Vilnius Gediminas Technical University (VGTU).
Tip
Pro Choice: Check the Architecture Deep-Dive to see how we handle multimodal intent recognition and technical "diffs" for consistency.

