Skip to content

danmotoc94/Generative-Story-Engine-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎬 Generative Story Engine (Luxury Edition)

A broadcast-quality AI pipeline for cinematic storytelling. Flux Image Sequences • CogVideoX Motion • Generative Audio • 4K Upscaling

Python PyTorch Hardware Status

🎥 Demos & Results

1. The Full Experience (4K Video + Audio)

demo_small.mp4

2. Flux

Demo Animation


📖 Overview

This repository houses a modular generative AI engine designed for high-end content creation on consumer hardware. It orchestrates multiple state-of-the-art models to produce broadcast-quality assets.

It features a custom memory management system (aggressive_cleanup) that allows heavy models like Flux.1-Schnell and CogVideoX-2b to run sequentially on 8GB VRAM without crashing.

✨ Key Features

1. Standalone High-Fidelity Visuals (Flux 1920p)

  • Engine: FLUX.1-schnell
  • Function: A dedicated engine for generating "Luxury" grade image sequences or GIFs independent of the video generation pipeline.
  • Resolution: Configured for 1920x1920p (Square/IMAX aspect) to maximize texture detail.
  • Optimization: Uses sequential CPU offloading to fit distinct high-res generation within 8GB VRAM.

2. Cinematic Video & 4K Upscaling

  • Motion Engine: CogVideoX-2b generates consistent, directed motion clips from text prompts (native resolution ~720p).
  • The 4K Pipeline: Because CogVideo output is lower resolution, this project includes a custom Real-ESRGAN (Vulkan) bridge. It automatically takes the raw CogVideo outputs and upscales them 4x to a sharp 4K (2880p/3840p), making them ready for broadcast.

3. Intelligent Audio Generator

A custom AudioLDM2 implementation (src/audio_engine.py) that builds soundtracks intelligently:

  • Dual-Layer Synthesis: Generates a continuous "Theme" layer (ambient music) and a separate "SFX" layer (foley/sound effects) for every scene.
  • Smart Prompting: Automatically strips visual keywords (e.g., "4k", "camera", "lighting") from prompts so the audio model focuses purely on sound.
  • Auto-Sync: Uses MoviePy to detect video duration differences. It automatically loops the audio for longer videos or trims it for shorter ones to ensure perfect synchronization.

💻 Reference Hardware

This pipeline was optimized for the following local configuration:

Component Specification Performance Note
GPU NVIDIA RTX 4060 8GB VRAM (Optimized with aggressive offloading)
RAM 16GB Used for model weights during swapping

🏗️ Project Structure

Generative-Story-Engine/
├── configs/            # YAML Control Centers
│   ├── flux_config.yaml
│   └── story_config.yaml
├── src/                # Core Engines
│   ├── flux_engine.py      # Standalone Flux Generator (1920x1920)
│   ├── video_engine.py     # CogVideoX Wrapper
│   ├── audio_engine.py     # AudioLDM2 Dual-Layer Composer
│   └── memory.py           # Memory Management
├── tools/              # Post-Processing
│   └── upscale_pipeline.py # Real-ESRGAN (Vulkan) Wrapper
├── main.py             # CLI Entry Point
└── requirements.txt    # Dependencies

🚀 Installation

  1. Clone & Environment
git clone https://github.com/danmotoc94/Generative-Story-Engine-.git
cd Generative-Story-Engine-

python -m venv venv

# Windows
venv\Scripts\activate

# Linux/Mac
source venv/bin/activate
  1. Install Dependencies
pip install -r requirements.txt
  1. Usage - run the central hub to access all engines:
python main.py
Option 1: Generate Base Images (Flux 1920p)

Option 2: Animate Scenes (CogVideoX)

Option 3: Upscale to 4K

Option 4: Generate & Merge Audio

⚙️ Configuration

Video Settings (configs/story_config.yaml)

model_settings:
  model_id: "THUDM/CogVideoX-2b"
  guidance: 6.0       # Higher = follows prompt strictly
  num_frames: 49      # Approx 6 seconds

Flux Settings (configs/flux_config.yaml)

video_settings:
  resolution: 1920    # 1920x1920 High-Res Output

rendering:
  model_id: "black-forest-labs/FLUX.1-schnell"
  memory_optimization: "aggressive" # Essential for 8GB cards

Author: Dan Motoc

About

High-fidelity AI video pipeline for cinematic storytelling. Powered by Flux, CogVideoX, Generative Audio & 4K Upscaling.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages