Skip to content

raghulpranxsh/CrossLingualAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎬 VideoTranslate AI - Multilingual Video Translation Platform

A sophisticated AI-powered web application that translates video content across multiple languages using advanced machine learning models. The platform leverages Whisper for speech recognition, NLLB-200 for translation, and RAG (Retrieval-Augmented Generation) for context-aware processing.

Python Flask


✨ Features

  • πŸŽ₯ Video Upload & Processing - Support for multiple video formats (MP4, AVI, MOV, MKV, WAV)
  • 🌐 Multilingual Translation - Translate videos to and from numerous languages
  • 🧠 AI-Powered Intelligence - Summarize and analyze video transcripts with Google Gemini
  • πŸ“ Automatic Subtitle Generation - Generate and embed translated subtitles directly into videos
  • ⚑ Fast Processing - Efficient algorithms for rapid video translation
  • πŸ’¬ Content Generation - Instantly get summaries of the videos using Generative AI
  • πŸ’Ύ Easy Download - Get your translated videos with embedded subtitles

πŸ“· Screen Shots

image image image

πŸ› οΈ Tech Stack

Component Technology
Backend Framework Flask (Python)
Speech Recognition OpenAI Whisper
Translation Model Facebook NLLB-200
Content Summary Google Gemini API (gemini-2.5-flash)
Video Processing FFmpeg
Frontend HTML5, CSS3, JavaScript, Bootstrap 5
ML Libraries PyTorch, Transformers, HuggingFace

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend  β”‚  User uploads video
β”‚  (Browser)  β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Flask API  β”‚  Receives video file
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Whisper   β”‚  Extracts audio & transcribes
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Gemini APIβ”‚  Context analysis & summarization
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   NLLB-200  β”‚  Translates content
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   FFmpeg    β”‚  Embeds subtitles & generates video
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   User      β”‚  Downloads translated video & reads summary
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • FFmpeg installed on your system
  • 8GB+ RAM (16GB recommended for optimal performance)
  • 5GB+ free disk space for ML models
  • Internet connection (for initial model download)

Installation

Option 1: Automated Setup (Recommended)

macOS / Linux:

chmod +x run.sh
./run.sh

Windows:

run.bat

Option 2: Manual Setup

# Clone the repository
git clone https://github.com/raghulpranxsh/CrossLingualAI.git
cd CrossLingualAI

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

# Create necessary directories
mkdir -p uploads outputs

# Run the application
python app.py

First Run

On the first run, the application will automatically download required ML models:

  • Whisper Base Model (~150MB)
  • NLLB-200 Translation Model (~1.2GB)

Note: This initial download may take 10-15 minutes depending on your internet speed. Models are cached locally for subsequent runs.


πŸ“– Usage

  1. Start the Server

    python app.py
  2. Access the Application

    • Open your browser and navigate to: http://localhost:5001
  3. Upload Video

    • Click "Choose File" or drag and drop your video
    • Supported formats: MP4, AVI, MOV, MKV, WAV
    • Maximum file size: 500MB
  4. Configure Translation

    • Select source language (or use Auto-detect)
    • Select target language
    • Click "Process Translation"
  5. Download Result

    • Wait for processing to complete
    • Download your translated video with embedded subtitles

πŸ”Œ API Endpoints

Method Endpoint Description
GET / Main web interface
GET /api/health Health check endpoint
POST /api/upload Upload and process video
GET /api/download/<filename> Download processed video

Example API Usage

# Health check
curl http://localhost:5001/api/health

# Upload video (using curl)
curl -X POST -F "file=@video.mp4" \
     -F "sourceLanguage=auto" \
     -F "targetLanguage=en" \
     http://localhost:5001/api/upload

🧠 How It Works

  1. Audio Extraction: Video file is processed to extract audio track
  2. Speech Recognition: Whisper transcribes audio to text in the original language
  3. Language Detection: Automatic detection of source language (if not specified)
  4. Summary Generation: Transcript is sent to Google Gemini to get a concise summary
  5. Translation: NLLB-200 translates the transcribed text to target language
  6. Subtitle Generation: SRT file is created with translated subtitles and timestamps
  7. Video Processing: FFmpeg embeds subtitles into the original video
  8. Delivery: User receives the translated video with embedded subtitles and a textual summary

πŸ“ Project Structure

CrossLingualAI/
β”‚
β”œβ”€β”€ app.py                 # Flask backend server
β”œβ”€β”€ index.html            # Frontend web interface
β”œβ”€β”€ requirements.txt      # Python dependencies
β”œβ”€β”€ run.sh               # Setup script (macOS/Linux)
β”œβ”€β”€ run.bat              # Setup script (Windows)
β”œβ”€β”€ README.md            # Project documentation
β”‚
β”œβ”€β”€ uploads/             # Temporary upload directory
└── outputs/             # Processed video output directory

βš™οΈ Configuration

Port Configuration

By default, the server runs on port 5001. To change this, modify app.py:

app.run(debug=True, host='0.0.0.0', port=5001)  # Change port here

Model Configuration

Models are automatically downloaded on first run. To use different Whisper models:

whisper_model = whisper.load_model("base")  # Options: tiny, base, small, medium, large

πŸ› Troubleshooting

Common Issues

Port Already in Use

# Kill process on port 5001
lsof -ti :5001 | xargs kill -9

FFmpeg Not Found

# macOS
brew install ffmpeg

# Linux
sudo apt-get install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

Out of Memory

  • Close other applications
  • Use smaller Whisper model (tiny/base instead of large)
  • Process shorter videos

Models Not Downloading

  • Check internet connection
  • Verify disk space (need ~2GB free)
  • Check HuggingFace access

πŸ“Š Performance

  • Processing Time: ~2-5 minutes for a 1-minute video
  • Memory Usage: ~3-4GB during processing
  • Supported Languages: 20+ languages via NLLB-200
  • Video Formats: MP4, AVI, MOV, MKV, WAV

πŸ”’ Security Notes

  • Uploaded files are temporarily stored and automatically deleted after processing
  • No user data is permanently stored
  • All processing happens server-side
  • Maximum file size limit: 500MB

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“ License

This project is open-source and available under the MIT License.


πŸ‘¨β€πŸ’» Author

Raghul Pranesh K V


Acknowledgments

  • OpenAI for Whisper speech recognition model
  • Facebook AI for NLLB-200 translation model
  • HuggingFace for Transformers and Sentence Transformers
  • Flask community for the excellent web framework

About

An AI-powered web app that automatically translates videos into 20+ languages. Built with Flask, it uses OpenAI Whisper for speech transcription, Facebook NLLB-200 for precise translation, Google Gemini for intelligent video summarization, and FFmpeg to generate and embed translated subtitles directly into your videos.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors