🎬 VideoTranslate AI - Multilingual Video Translation Platform

A sophisticated AI-powered web application that translates video content across multiple languages using advanced machine learning models. The platform leverages Whisper for speech recognition, NLLB-200 for translation, and RAG (Retrieval-Augmented Generation) for context-aware processing.

✨ Features

🎥 Video Upload & Processing - Support for multiple video formats (MP4, AVI, MOV, MKV, WAV)
🌐 Multilingual Translation - Translate videos to and from numerous languages
🧠 AI-Powered Intelligence - Summarize and analyze video transcripts with Google Gemini
📝 Automatic Subtitle Generation - Generate and embed translated subtitles directly into videos
⚡ Fast Processing - Efficient algorithms for rapid video translation
💬 Content Generation - Instantly get summaries of the videos using Generative AI
💾 Easy Download - Get your translated videos with embedded subtitles

📷 Screen Shots

🛠️ Tech Stack

Component	Technology
Backend Framework	Flask (Python)
Speech Recognition	OpenAI Whisper
Translation Model	Facebook NLLB-200
Content Summary	Google Gemini API (gemini-2.5-flash)
Video Processing	FFmpeg
Frontend	HTML5, CSS3, JavaScript, Bootstrap 5
ML Libraries	PyTorch, Transformers, HuggingFace

🏗️ Architecture

┌─────────────┐
│   Frontend  │  User uploads video
│  (Browser)  │
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Flask API  │  Receives video file
└──────┬──────┘
       │
       ▼
┌─────────────┐
│   Whisper   │  Extracts audio & transcribes
└──────┬──────┘
       │
       ▼
┌─────────────┐
│   Gemini API│  Context analysis & summarization
└──────┬──────┘
       │
       ▼
┌─────────────┐
│   NLLB-200  │  Translates content
└──────┬──────┘
       │
       ▼
┌─────────────┐
│   FFmpeg    │  Embeds subtitles & generates video
└──────┬──────┘
       │
       ▼
┌─────────────┐
│   User      │  Downloads translated video & reads summary
└─────────────┘

🚀 Quick Start

Prerequisites

Python 3.8 or higher
FFmpeg installed on your system
8GB+ RAM (16GB recommended for optimal performance)
5GB+ free disk space for ML models
Internet connection (for initial model download)

Installation

Option 1: Automated Setup (Recommended)

macOS / Linux:

chmod +x run.sh
./run.sh

Windows:

run.bat

Option 2: Manual Setup

# Clone the repository
git clone https://github.com/raghulpranxsh/CrossLingualAI.git
cd CrossLingualAI

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

# Create necessary directories
mkdir -p uploads outputs

# Run the application
python app.py

First Run

On the first run, the application will automatically download required ML models:

Whisper Base Model (~150MB)
NLLB-200 Translation Model (~1.2GB)

Note: This initial download may take 10-15 minutes depending on your internet speed. Models are cached locally for subsequent runs.

📖 Usage

Start the Server
```
python app.py
```
Access the Application
- Open your browser and navigate to: http://localhost:5001
Upload Video
- Click "Choose File" or drag and drop your video
- Supported formats: MP4, AVI, MOV, MKV, WAV
- Maximum file size: 500MB
Configure Translation
- Select source language (or use Auto-detect)
- Select target language
- Click "Process Translation"
Download Result
- Wait for processing to complete
- Download your translated video with embedded subtitles

🔌 API Endpoints

Method	Endpoint	Description
`GET`	`/`	Main web interface
`GET`	`/api/health`	Health check endpoint
`POST`	`/api/upload`	Upload and process video
`GET`	`/api/download/<filename>`	Download processed video

Example API Usage

# Health check
curl http://localhost:5001/api/health

# Upload video (using curl)
curl -X POST -F "file=@video.mp4" \
     -F "sourceLanguage=auto" \
     -F "targetLanguage=en" \
     http://localhost:5001/api/upload

🧠 How It Works

Audio Extraction: Video file is processed to extract audio track
Speech Recognition: Whisper transcribes audio to text in the original language
Language Detection: Automatic detection of source language (if not specified)
Summary Generation: Transcript is sent to Google Gemini to get a concise summary
Translation: NLLB-200 translates the transcribed text to target language
Subtitle Generation: SRT file is created with translated subtitles and timestamps
Video Processing: FFmpeg embeds subtitles into the original video
Delivery: User receives the translated video with embedded subtitles and a textual summary

📁 Project Structure

CrossLingualAI/
│
├── app.py                 # Flask backend server
├── index.html            # Frontend web interface
├── requirements.txt      # Python dependencies
├── run.sh               # Setup script (macOS/Linux)
├── run.bat              # Setup script (Windows)
├── README.md            # Project documentation
│
├── uploads/             # Temporary upload directory
└── outputs/             # Processed video output directory

⚙️ Configuration

Port Configuration

By default, the server runs on port 5001. To change this, modify app.py:

app.run(debug=True, host='0.0.0.0', port=5001)  # Change port here

Model Configuration

Models are automatically downloaded on first run. To use different Whisper models:

whisper_model = whisper.load_model("base")  # Options: tiny, base, small, medium, large

🐛 Troubleshooting

Common Issues

Port Already in Use

# Kill process on port 5001
lsof -ti :5001 | xargs kill -9

FFmpeg Not Found

# macOS
brew install ffmpeg

# Linux
sudo apt-get install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

Out of Memory

Close other applications
Use smaller Whisper model (tiny/base instead of large)
Process shorter videos

Models Not Downloading

Check internet connection
Verify disk space (need ~2GB free)
Check HuggingFace access

📊 Performance

Processing Time: ~2-5 minutes for a 1-minute video
Memory Usage: ~3-4GB during processing
Supported Languages: 20+ languages via NLLB-200
Video Formats: MP4, AVI, MOV, MKV, WAV

🔒 Security Notes

Uploaded files are temporarily stored and automatically deleted after processing
No user data is permanently stored
All processing happens server-side
Maximum file size limit: 500MB

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📝 License

This project is open-source and available under the MIT License.

👨‍💻 Author

Raghul Pranesh K V

🌐 GitHub: @raghulpranxsh
💼 LinkedIn: raghulpraneshkv

Acknowledgments

OpenAI for Whisper speech recognition model
Facebook AI for NLLB-200 translation model
HuggingFace for Transformers and Sentence Transformers
Flask community for the excellent web framework

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 VideoTranslate AI - Multilingual Video Translation Platform

✨ Features

📷 Screen Shots

🛠️ Tech Stack

🏗️ Architecture

🚀 Quick Start

Prerequisites

Installation

Option 1: Automated Setup (Recommended)

Option 2: Manual Setup

First Run

📖 Usage

🔌 API Endpoints

Example API Usage

🧠 How It Works

📁 Project Structure

⚙️ Configuration

Port Configuration

Model Configuration

🐛 Troubleshooting

Common Issues

📊 Performance

🔒 Security Notes

🤝 Contributing

📝 License

👨‍💻 Author

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
QUICKSTART.md		QUICKSTART.md
README.md		README.md
app.py		app.py
index.html		index.html
requirements.txt		requirements.txt
run.bat		run.bat
run.sh		run.sh

Folders and files

Latest commit

History

Repository files navigation

🎬 VideoTranslate AI - Multilingual Video Translation Platform

✨ Features

📷 Screen Shots

🛠️ Tech Stack

🏗️ Architecture

🚀 Quick Start

Prerequisites

Installation

Option 1: Automated Setup (Recommended)

Option 2: Manual Setup

First Run

📖 Usage

🔌 API Endpoints

Example API Usage

🧠 How It Works

📁 Project Structure

⚙️ Configuration

Port Configuration

Model Configuration

🐛 Troubleshooting

Common Issues

📊 Performance

🔒 Security Notes

🤝 Contributing

📝 License

👨‍💻 Author

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages