Skip to content

aaurelions/short-video-maker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎬 AI to create short vertical videos optimized for platforms such as TikTok, YouTube Shorts, and Instagram Reels πŸ€–

A fully automated, stateful pipeline that generates short-form vertical videos for language education from a single text prompt. This agent uses the Google Gemini API for all creative tasks and meticulously logs every action in a SQLite database, ensuring full traceability and recoverability.

Example 1 Example 2 Example 3
final_video.mp4
final_video.mp4
final_video.mp4

✨ Key Features

  • πŸš€ End-to-End Automation: Go from a single prompt to a final .mp4 video with one command.
  • 🧠 Intelligent Content Planning: The AI detects source/target languages, generates a script, and creates custom prompts for a perfectly themed background image and music.
  • πŸ—„οΈ Persistent & Auditable: Every run is a "project" with its plan and detailed log history stored in a central SQLite database.
  • πŸ”„ Stateful & Recoverable: Automatically tracks the status of each project. If a job fails, you can resume from the exact point of failure.
  • πŸ“± Social-Media Ready: Generates video titles, descriptions, and hashtags in the audience's native language.
  • πŸ”‘ Rate Limit Aware: Includes an API key rotator to gracefully handle free-tier API rate limits by switching keys automatically.
  • 🎨 High-Quality Video: Features improved text placement, dynamic animations, darkened backgrounds for legibility, and professional, language-specific typography.
  • πŸ”§ Granular Control: Regenerate the entire video or just specific assets (like the background or music) for any project.

πŸ›οΈ Architecture & Workflow

The agent operates as a multi-stage pipeline, where each component has a single responsibility. The entire process is orchestrated by main.py and centrally tracked in a SQLite database.

Workflow Diagram

sequenceDiagram
    participant User as πŸ‘€ User (CLI)
    participant Main as πŸš€ main.py
    participant DB as πŸ—„οΈ DatabaseManager
    participant Planner as 🧠 ContentPlanner
    participant Assets as πŸ› οΈ AssetGenerator
    participant Composer as 🎞️ VideoComposer
    participant Gemini as ✨ Google Gemini API

    User->>Main: python main.py --prompt "..."
    Main->>DB: create_preliminary_project()
    Main->>Planner: generate_plan(prompt)
    Planner->>Gemini: Generate JSON plan (via gemini-pro)
    Gemini-->>Planner: VideoPlan object
    Planner-->>Main: Returns VideoPlan
    Main->>DB: save_plan_and_finalize_project()

    loop Asset Generation & Composition
        Main->>DB: update_project_status("Generating Assets")
        Main->>Assets: generate_core_assets(plan)
        Assets->>Gemini: Generate TTS Audio & BG Image
        Gemini-->>Assets: .wav & .png files

        Main->>Composer: calculate_video_duration()
        Composer-->>Main: final_duration

        Main->>Assets: generate_music_asset(duration)
        Assets->>Gemini: Generate Music (.wav)
        Gemini-->>Assets: .wav file

        Main->>Composer: create_video(plan, duration)
        Composer-->>Main: final_video.mp4
    end

    Main->>DB: update_project_status("Completed")
    Main-->>User: βœ… Success! Shows final video path & metadata.
Loading

Component Breakdown

The core logic is encapsulated within the agent/ package:

  • πŸš€ main.py (The Conductor): The main entry point. Parses command-line arguments (--prompt, --resume, --regenerate-*), initializes all managers, and orchestrates the project workflow from start to finish.
  • πŸ—„οΈ agent/database.py (The State Manager): Manages all interactions with the projects.sqlite database. It creates, retrieves, and updates project records and statuses, making the entire pipeline stateful.
  • πŸ“ agent/logger.py (The Auditor): A singleton logger that provides clean, high-level console output using rich while simultaneously writing verbose, structured logs (including AI prompts and errors) to the database for full auditability.
  • πŸ”‘ agent/api_manager.py (The Diplomat): Manages a pool of Google Gemini API keys from your .env file. If one key hits a rate limit, it automatically and seamlessly switches to the next available key.
  • 🧠 agent/planner.py (The Creative Director): Takes the initial user prompt and uses the Gemini API to generate a comprehensive VideoPlan. This plan is a structured JSON object containing everything from the script and word pairs to social media copy and AI prompts for other assets.
  • πŸ› οΈ agent/asset_generator.py (The Production Crew): Executes the VideoPlan by calling the appropriate Gemini models to generate the background image, all text-to-speech audio files, and the background music track.
  • 🎞️ agent/composer.py (The Editor): Uses MoviePy to assemble all the generated image and audio assets into a final, polished .mp4 video, applying animations, text overlays, and audio mixing.
  • πŸŽ›οΈ agent/config.py (The Control Panel): A centralized file for all static configuration: model names, API delays, video dimensions, font paths, music volume, and more. This is the first place to look for customization.

πŸ› οΈ Setup

  1. Clone the repository:

    git clone https://github.com/aaurelions/short-video-maker
    cd short-video-maker
  2. Create and activate a virtual environment:

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Set up Google Gemini API Keys:

    • Get API keys from Google AI Studio.
    • Create a .env file in the project root.
    • Add your keys, comma-separated. The system will rotate them if one hits a rate limit.
    GOOGLE_API_KEYS="YOUR_API_KEY_1,YOUR_API_KEY_2"
  5. Install FFmpeg and ImageMagick (for MoviePy):

  6. Install Fonts (Recommended for best quality): On macOS/Linux, you can clone the Google Fonts repository.

    # Example for macOS
    cd ~/Library/Fonts/
    git clone https://github.com/google/fonts.git google-fonts

    Note: Font paths are configured in agent/config.py and may need to be adjusted for your OS.


πŸš€ How to Run

Create a New Video

python main.py --prompt "Create a video for English speakers to learn 5 essential Japanese words for a ramen shop"

Resume a Failed Project

If a project fails, you can resume it.

# Resume the very last project that failed
python main.py --resume

# Resume a specific project by name
python main.py --resume "japanese-ramen-shop-words-20231027103000"

Regenerate Assets

You can regenerate assets for any existing project without starting over. This is useful for tweaking visuals, audio, or fixing a failed music track.

If you don't provide a project name, it will target the last modified project.

# Regenerate EVERYTHING for the last project
python main.py --regenerate

# Regenerate only the final video for a specific project
python main.py --regenerate-video "project-name-to-fix"

# Regenerate just the background image for the last project
python main.py --regenerate-background

# Regenerate all spoken word audio files
python main.py --regenerate-words

# Regenerate only the music track
python main.py --regenerate-music

Full list of Regeneration Flags

  • -r, --regenerate
  • -rv, --regenerate-video
  • -rb, --regenerate-background
  • -ri, --regenerate-intro
  • -rm, --regenerate-music
  • -rw, --regenerate-words
  • -rw0, --regenerate-word-0 (and other specific word indices)

🎨 Customization

The easiest way to customize the output is by editing agent/config.py:

  • Voices: Set TTS_RANDOM_VOICE = False and change TTS_DEFAULT_VOICE to use a consistent voice.
  • Fonts: Modify the FONT_MAPPINGS dictionary to change fonts for different languages or scenes. You'll need to provide the correct path to the .ttf file on your system.
  • Timings & Style: Adjust values like CHALLENGE_DURATION_S, MUSIC_VOLUME, or BACKGROUND_DARKEN_OPACITY to change the pacing and look of the video.

πŸ“¦ Output

A successful run will produce a clear summary in your terminal and a neatly organized project folder in output/.

Terminal Summary:

✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨
βœ… SUCCESS Project 'japanese-ramen-shop-words-20231027103000' completed successfully!
πŸŽ₯ Final video archived in: output/japanese-ramen-shop-words-20231027103000/
--------------------
βœ… Title (English): 5 Essential Japanese Words for the Ramen Shop!
βœ… Description (English): This video will teach you 5 key Japanese words you need to know when visiting a ramen shop. Perfect for your next trip to Japan!
βœ… Hashtags: #LearnJapanese #JapaneseLesson #RamenShop #JapanTravel #ζ—₯本θͺžε‹‰εΌ· #ラーパン
✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨

Project Directory: The output/ directory contains everything:

output/
β”œβ”€β”€ japanese-ramen-shop-words-20231027103000/
β”‚   β”œβ”€β”€ background.png
β”‚   β”œβ”€β”€ intro_audio.wav
β”‚   β”œβ”€β”€ word_0.wav
β”‚   β”œβ”€β”€ word_1.wav
β”‚   β”œβ”€β”€ ...
β”‚   β”œβ”€β”€ music.wav
β”‚   └── final_video.mp4
└── projects.sqlite  <-- The central database for ALL projects

Releases

No releases published

Packages

 
 
 

Contributors

Languages