Skip to content

CarlAmine/AI_Editor

Repository files navigation

AI Editor

AI-assisted video editing pipeline — from reference analysis to rendered output, with full-stack orchestration, Shorts conversion, and one-click publishing.

Python FastAPI React Shotstack License: MIT


It can:

  • analyze a reference video
  • build a timeline from scenes or OCR text
  • assemble clips from YouTube or Google Drive sources
  • render a master output
  • optionally post-process a Shorts preview
  • upload the approved render to YouTube

Demo

<<<<<<< HEAD

What It Does

AI Editor is a multi-stage media pipeline that accepts a reference video and a set of source clips, uses AI analysis to understand structure and style, builds a stage-based edit plan, renders a polished output via the Shotstack API, and optionally converts the result to a vertical Short and publishes to YouTube.

It is not a simple wrapper around an LLM. It integrates:

  • computer-vision based video & scene analysis (EasyOCR, PaddleOCR, SceneDetect)
  • AI-driven edit planning via Groq / conversational brief builder
  • a structured multi-stage pipeline runner with per-job artifact storage
  • a Shotstack rendering integration with timeline assembly logic
  • a React frontend with job status tracking and Google Drive/YouTube OAuth

Key Features

Feature Detail
🎬 Reference video analysis Scene detection, OCR extraction, structure parsing
🤖 AI edit planning Groq-powered conversational brief → structured edit plan
🗂 Stage-based pipeline Ordered stages with state persistence per job
🎞 Shotstack rendering Timeline assembly → cloud render → artifact storage
✂� Shorts conversion 16:9 → 9:16 crop, reframe, and post-process
📤 YouTube upload OAuth 2.0 integration, metadata, direct publish
🗄 Google Drive ingestion Service-account or OAuth-based asset retrieval
🧪 Unit tests Coverage for normalization, overlay policy, text segments

Architecture

graph TD
    User(["User / Browser"])
    FE["React Frontend\nVite + REST"]
    API["FastAPI Backend\napp.py"]
    CHAT["Chatbot Interface\nGroq LLM"]
    BRIEF["Edit Brief JSON"]
    ANA["Analyzer\nEasyOCR · PaddleOCR · SceneDetect"]
    PLAN["Edit Plan JSON"]
    RUNNER["Pipeline Runner\npipeline/runner.py"]
    DL["Downloader\nyt-dlp · Google Drive"]
    EDITOR["Editor Builder\nShotstack Timeline"]
    OVERLAY["Overlay Planner"]
    SHORTS["Shorts Converter"]
    SHOTSTACK["Shotstack Render API"]
    ARTIFACTS["Artifact Storage\ntmp/jobs/job_id/"]
    UPLOAD["YouTube Uploader\nGoogle OAuth"]
    GDRIVE["Google Drive"]

    User -->|"chat brief + clips"| FE
    FE -->|"REST calls"| API
    API --> CHAT
    CHAT --> BRIEF
    API --> ANA
    ANA --> PLAN
    BRIEF --> RUNNER
    PLAN --> RUNNER
    API --> RUNNER
    RUNNER --> DL
    RUNNER --> EDITOR
    RUNNER --> OVERLAY
    RUNNER --> SHORTS
    DL --> GDRIVE
    EDITOR -->|"render job"| SHOTSTACK
    SHOTSTACK -->|"video URL"| ARTIFACTS
    SHORTS --> UPLOAD
    ARTIFACTS --> FE
    UPLOAD --> FE
Loading

Request Flow

  1. User submits a brief via the React chat interface → Groq LLM refines it into a structured edit plan.
  2. Reference video is analyzed — scenes are detected, text overlays are OCR-extracted, structure is mapped.
  3. Pipeline runner executes ordered stages: asset download → edit assembly → overlay planning → render submission.
  4. Shotstack renders the timeline; the backend polls for completion and stores the artifact.
  5. Optional post-processing converts the render to a 9:16 Short and uploads to YouTube.

See docs/architecture.md for a full module breakdown.


Tech Stack

Layer Technology
Backend API Python 3.10+, FastAPI, Uvicorn
AI / Analysis EasyOCR, PaddleOCR, SceneDetect, OpenCV, Groq API
Edit Planning Custom planner + LLM-assisted brief builder
Rendering Shotstack SDK (cloud video rendering)
Asset Ingestion yt-dlp, Google Drive API (service account + OAuth)
Export YouTube Data API v3, Google Auth OAuthlib
Frontend React + Vite
Tests pytest
Containerization Docker

Repository Structure

AI_Editor/
├── app.py                    # FastAPI entrypoint — all HTTP routes
├── Dockerfile                # Container build
├── requirements.txt          # Python dependencies
├── .env.example              # Environment variable reference
│
├── ai_editor/                # Core AI & media logic
│   ├── analyzer.py           # Scene detection, OCR, video analysis
│   ├── chatbot_interface.py  # Groq-powered brief builder
│   ├── downloader.py         # yt-dlp + Google Drive asset fetching
│   ├── editor.py             # Shotstack timeline assembly
│   ├── overlay_planner.py    # Text/graphic overlay scheduling
│   ├── youtube_clipper.py    # Clip extraction and trimming
│   ├── youtube_uploader.py   # YouTube OAuth upload flow
│   └── google_auth.py        # Google credential management
│
├── pipeline/                 # Orchestration layer
│   ├── runner.py             # Stage runner (main orchestrator — ~60 KB)
│   ├── state.py              # Per-job state machine
│   ├── artifacts.py          # Artifact path resolution and storage
│   ├── plans/                # Edit plan schemas and planners
│   └── storage/              # Job storage helpers
│
├── frontend/                 # React UI (Vite)
│
├── docs/                     # Documentation
│   ├── assets/               # Screenshots and demo GIF
│   ├── releases/             # Release note drafts
│   ├── API_EXAMPLES.md
│   ├── DEPLOYMENT.md
│   ├── PROJECT_STRUCTURE.md
│   ├── SETUP_GUIDE.md
│   ├── TROUBLESHOOTING.md
│   ├── architecture.md
│   └── pipeline_state.md
│
└── tests/
    ├── test_editor_normalization.py
    ├── test_overlay_policy.py
    └── test_text_segments.py

Quick Start

Prerequisites

  • Python 3.10+
  • Node.js 18+
  • A Shotstack API key (Stage key is free for development)
  • Optionally: Google Cloud service account for Drive ingestion, Groq API key

1 — Clone and install

git clone https://github.com/CarlAmine/AI_Editor.git
cd AI_Editor
python -m venv .venv && source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt

2 — Configure environment

cp .env.example .env
# Edit .env with your API keys (see Configuration section below)

3 — Run the backend

python app.py
# API available at http://localhost:8000
# Interactive docs at http://localhost:8000/docs

4 — Run the frontend

cd frontend
npm install
npm run dev
# UI available at http://localhost:5173

Docker (optional)

docker build -t ai-editor .
docker run -p 8000:8000 --env-file .env ai-editor

Configuration

All configuration is via environment variables. Copy .env.example to .env and fill in:

Variable Required Description
SHOTSTACK_KEY ✅ Shotstack API key (Stage or Production)
GROQ ✅ Groq API key for conversational brief builder
GOOGLE_APPLICATION_CREDENTIALS Optional Path to service account JSON for Drive access
VIDEO_FOLDER Optional Google Drive folder ID for source assets
MUSIC_URL Optional Default background music track URL
DEEPSEEK_KEY Optional Reserved for future LLM integration

See docs/SETUP_GUIDE.md for full configuration details.


API Examples

The FastAPI backend exposes a REST API. Interactive docs are at http://localhost:8000/docs.

# Start a new edit job
curl -X POST http://localhost:8000/jobs \
  -H 'Content-Type: application/json' \
  -d '{"reference_url": "https://...", "brief": "60s highlight reel, energetic style"}'

# Poll job status
curl http://localhost:8000/jobs/{job_id}/status

# Get rendered artifact
curl http://localhost:8000/jobs/{job_id}/artifact

See docs/API_EXAMPLES.md for full request/response examples.


Screenshots

Chat Interface

Chat Interface

Job Status

Job Status

Timeline Plan

Timeline Plan

Render Flow

Render Flow


Testing

# Run all tests
pytest tests/ -v

# Run a specific suite
pytest tests/test_editor_normalization.py -v

Test coverage:

  • test_editor_normalization.py — timeline normalization and clip boundary logic
  • test_overlay_policy.py — overlay scheduling and policy enforcement
  • test_text_segments.py — text segment parsing and validation

Technical Highlights

  • Multi-stage pipeline orchestration — pipeline/runner.py coordinates ordered stages with state transitions, retry logic, and per-job artifact isolation.
  • AI-assisted edit planning — Groq LLM powers the conversational brief builder; output is structured into a machine-readable edit plan JSON.
  • Scene-aware video analysis — SceneDetect-based shot boundary detection combined with EasyOCR and PaddleOCR for text extraction from frames.
  • Shotstack timeline assembly — ai_editor/editor.py programmatically constructs Shotstack render specs from clip lists, overlays, and timing metadata.
  • Overlay planning layer — overlay_planner.py schedules text/graphic elements respecting duration constraints and scene boundaries.
  • Shorts conversion flow — automatic 16:9 → 9:16 reframe and post-processing for vertical delivery.
  • Full-stack architecture — FastAPI backend + React frontend, communicating over REST, with Docker support.
  • Google ecosystem integration — OAuth 2.0 for YouTube upload, service account support for Drive ingestion.
  • Unit test coverage — pytest suites covering normalization edge cases, overlay policy, and segment logic.

Performance & Benchmarks

🚧 Benchmarking data to be added. Run the pipeline on representative inputs and open a PR to fill in the table.

Metric Value Notes
Average job duration — End-to-end, reference → rendered artifact
Shotstack render turnaround — Dependent on clip count and resolution
OCR extraction latency — Per frame, GPU vs CPU
Scene detection latency — Per minute of video
Shorts conversion time — Post-render
Pipeline success rate — Under normal load

Limitations & Known Issues

  • Shotstack rendering is asynchronous; long videos may require extended polling.
  • PaddleOCR has a large install footprint; a lighter OCR backend is on the roadmap.
  • Google Drive OAuth tokens require manual refresh in some environments.
  • The frontend does not yet support drag-and-drop clip reordering.
  • No built-in queue/worker system; concurrent jobs run in-process threads.

See docs/TROUBLESHOOTING.md for workarounds.


Roadmap

Short-term

  • Structured logging and per-stage timing metrics
  • Shotstack polling with exponential back-off
  • Asset validation before pipeline start
  • Expand test coverage to pipeline runner stages
  • CI workflow (GitHub Actions)

Medium-term

  • Lighter OCR backend option
  • Richer timeline editing UI (drag-and-drop, waveform preview)
  • Additional rendering backends (Creatomate, Remotion)
  • Smarter shot selection via visual similarity scoring
  • Automated caption generation (Whisper)
  • Task queue (Celery / RQ) for concurrent job isolation

Deployment

See docs/DEPLOYMENT.md for Docker-based deployment, reverse proxy setup, and production key configuration.


Contributing

See CONTRIBUTING.md for development setup, coding conventions, and how to submit changes.


  • app.py - FastAPI application entrypoint
  • ai_editor/ - media utilities, downloaders, auth helpers, editor builders, and upload integrations
  • pipeline/ - stage runner, planners, state management, artifact handling, and render orchestration
  • frontend/ - React user interface
  • docs/ - setup, deployment, troubleshooting, operations, and architecture notes
  • tests/ - unit tests for timing and overlay behavior

See docs/PROJECT_STRUCTURE.md for a more detailed map of the repository and tmp/jobs/<job_id>/ layout.

Credential Files

The server may use the following local files depending on which features you enable:

  • .env

    • required for SHOTSTACK_KEY
    • also stores optional runtime configuration
  • drive-oauth-client-secret.json

    • optional
    • used when you want users to connect their own Google Drive account through the UI
  • drive-token.json

    • optional
    • generated automatically after a successful Google Drive OAuth login
  • service-account.json

    • optional
    • used only if you choose DRIVE_AUTH_MODE=service_account
  • youtube-client-secret.json

    • optional
    • required only if you want to upload approved renders to YouTube
  • youtube-token.json

    • optional
    • generated automatically after the first successful YouTube OAuth login

None of these files should be committed. They are ignored by .gitignore.

Local Setup

  1. Create a Python virtual environment and install dependencies:
    • pip install -r requirements.txt
  2. Copy .env.example to .env and fill in at least:
    • SHOTSTACK_KEY
    • GROQ if you use the chat brief builder
  3. If you want Google Drive OAuth in the UI:
    • place drive-oauth-client-secret.json in the project root
  4. If you want YouTube uploads:
    • place youtube-client-secret.json in the project root
  5. Start the backend:
    • python -m uvicorn app:app --host 0.0.0.0 --port 10000 --reload
  6. Start the frontend:
    • cd frontend && npm install && npm run dev

Default local URLs:

  • Frontend: http://localhost:5173
  • Backend API: http://localhost:10000
  • Swagger docs: http://localhost:10000/docs

Operating Manual

For day-to-day operation, see docs/OPERATIONS.md.

That document covers:

  • how to start and stop the server
  • what goes in .env
  • when to use Drive OAuth vs service account
  • where to place Google and YouTube JSON files
  • how tokens are created
  • how to operate the UI end to end

Additional Docs


License

MIT — see LICENSE.

Packages

 
 
 

Contributors