Skip to content

Daphilippe/batch_video_translation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

21 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎬 Automated Video Translation Pipeline

This project is a modular and scalable pipeline designed to automate the process of transcribing and translating video content. It leverages Whisper.cpp for high-performance local transcription and integrates LLM Providers via UI automation to ensure high-quality, context-aware translations.

🌟 Key Features

  • Audio Extraction: Automated conversion of video tracks to 16kHz WAV format.
  • Local Transcription: Uses whisper.cpp for fast, offline, and private speech-to-text.
  • SRT Optimization: Advanced logic to merge identical consecutive segments and eliminate "flickering" effects.
  • Translation Engines:
    • Legacy Engine: Rapid translation using Google Translate APIs with line-level caching and custom technical dictionaries.
    • LLM Engine (Local): OpenAI-compatible local endpoint support (e.g., llama.cpp).
    • LLM Engine (UI): High-quality translation via Large Language Models (e.g., Copilot via UI Automation).
    • Hybrid Engine (new): Triple-source arbitration β€” S1 (source SRT) + L1 (literal Google Translate) + Mt (LLM draft) β†’ HybridRefiner for professional post-editing with incremental re-run support.
  • Resilient Workspace: Intelligent directory mirroring that allows resuming the process at any stage.

πŸ“‚ Project Structure

project_root/
β”œβ”€β”€ .vscode/
β”‚   └── tasks.json            # VS Code tasks (ruff, pylint, coverage)
β”œβ”€β”€ configs/
β”‚   β”œβ”€β”€ settings.json          # Global configuration (paths, languages, etc.)
β”‚   β”œβ”€β”€ system_prompt.txt      # System prompt template for LLM translation
β”‚   └── refinement_protocol.txt # Hybrid refiner arbitration instructions
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ main.py                # Pipeline orchestrator (4-step pipeline)
β”‚   β”œβ”€β”€ modules/
β”‚   β”‚   β”œβ”€β”€ translator.py      # BaseTranslator β€” abstract class (skip logic, standardize)
β”‚   β”‚   β”œβ”€β”€ legacy_translator.py # Google Translate + dictionary + line-cache
β”‚   β”‚   β”œβ”€β”€ llm_translator.py  # Chunked LLM translation
β”‚   β”‚   β”œβ”€β”€ extractor.py       # Audio extraction (FFmpeg segmentation)
β”‚   β”‚   β”œβ”€β”€ transcriber.py     # Whisper.cpp subprocess wrapper
β”‚   β”‚   β”œβ”€β”€ srt_optimizer.py   # SRT structure cleaning (Step 3)
β”‚   β”‚   β”œβ”€β”€ strategies/
β”‚   β”‚   β”‚   └── hybrid_refiner.py # Triple-source arbitration (S1+L1+Mt)
β”‚   β”‚   └── providers/
β”‚   β”‚       β”œβ”€β”€ base_provider.py  # LLMProvider ABC + LLMProviderError
β”‚   β”‚       β”œβ”€β”€ llama_provider.py # Local llama.cpp HTTP provider
β”‚   β”‚       └── copilot_ui.py     # Browser-based UI automation provider
β”‚   └── utils/
β”‚       β”œβ”€β”€ srt_handler.py     # SRT parsing, rendering, alignment, hashing
β”‚       └── file_handler.py    # DirectoryMirrorTask base class
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ test_srt_handler.py    # 84 tests (Unicode, CJK, merge, standardize)
β”‚   β”œβ”€β”€ test_file_handler.py
β”‚   β”œβ”€β”€ test_llm_provider.py
β”‚   └── test_llm_translator.py
β”œβ”€β”€ pyproject.toml             # Build config + ruff + pylint + coverage settings
β”œβ”€β”€ requirements.txt
└── readme.md

βš™οΈ Configuration (settings.json)

Configure your environment before launching the pipeline:

{
  "base_working_dir": "./workspace",
  "whisper": {
    "bin_path": "C:/path/to/whisper/whisper-cli.exe",
    "model_path": "C:/path/to/whisper/models/ggml-large-v3.bin",
    "lang": "auto"
  },
  "llm_config": {
    "source_lang": "English",
    "target_lang": "French",
    "chunk_size": 20
  },
  "technical_dictionary": {
    "example term": "target term",
    "industry keyword": "mot-clΓ© mΓ©tier"
  }
}

πŸš€ Usage

1. Full Execution

Processes everything from the source video to the final translated subtitle:

python src/main.py --input "./source_videos" --output "./results" --mode full --engine llm-ui

Available --engine values:

  • legacy (Google Translate + technical dictionary)
  • llm-local (local OpenAI-compatible endpoint, e.g. llama.cpp)
  • llm-ui (Copilot UI automation)
  • hybrid (triple-source arbitration: Legacy L1 + LLM Mt β†’ Refiner)

2. Step-by-Step Execution

You can isolate specific stages using the --mode flag:

  • extract: Extract audio from video.
  • transcribe: Generate raw subtitles from audio.
  • optimize: Clean and merge subtitle blocks (produces S1).
  • translate: Translate subtitles only (requires existing SRT files in 3_clean_srt/).

3. Hybrid Engine Details

The hybrid engine executes three sub-steps automatically:

  1. L1 generation β€” Legacy translator produces a literal reference.
  2. Mt generation β€” LLM translator produces a stylistic draft.
  3. Arbitration β€” HybridRefiner aligns L1 and Mt against S1 timestamps and sends windowed prompts to the LLM for final post-editing.

On re-runs, the refiner detects which blocks are already correctly translated and only re-processes problematic windows (missing, empty, or untranslated blocks), saving significant LLM cost.


πŸ› οΈ Requirements & Setup

Instead of installing everything from your global environment, it is highly recommended to use a clean virtual environment to avoid conflicts:

  1. Create and activate a virtual environment:
python -m venv venv
# Windows
.\venv\Scripts\activate
# Linux/Mac
source venv/bin/activate
  1. Install core dependencies:
pip install -r requirements.txt
  1. Install development tools (linting, coverage):
pip install ruff pylint pytest-cov
  1. Run tests:
python -m pytest tests/ -v
  1. Run tests with coverage:
python -m pytest --cov=src --cov-config=pyproject.toml --cov-report=term-missing tests/ -v
  1. Linting:
# Ruff (fast linter + formatter)
python -m ruff check src/ tests/
python -m ruff format src/ tests/

# Pylint (deep analysis)
python -m pylint src/modules/ src/utils/ src/main.py --rcfile=pyproject.toml
  1. VS Code Tasks (Ctrl+Shift+P β†’ "Tasks: Run Task"):

    • Ruff: lint / Ruff: lint + fix / Ruff: format
    • Pylint: full analysis
    • Pytest: run all (default test task)
    • Coverage: run + report / Coverage: open HTML report
    • Quality: full check (ruff + pylint + coverage) β€” runs all three sequentially
  2. External Requirements:

  • FFmpeg: Must be in your System PATH.
  • Whisper.cpp: Compiled executable (e.g., main.exe).
  • Microsoft Edge: Required for the default CopilotUIProvider.

βš™οΈ Full Configuration Reference (settings.json)

Here is the exhaustive list of parameters available in your configuration file:

Category Parameter Description
Global base_working_dir Root folder where the internals/ workspace will be created.
Whisper bin_path Absolute path to the whisper.cpp executable.
model_path Path to the .bin model file (e.g., ggml-medium.bin).
lang Source language code (ru, en, fr or auto).
LLM Engine source_lang Full name of the source language for the prompt (e.g., "Russian").
target_lang Full name of the target language (e.g., "French").
chunk_size Number of SRT blocks sent in a single prompt (default: 30).
chunk_delay Seconds between LLM API calls (rate-limit protection, default: 1.0).
prompt_file Path to the system prompt template text file used by the LLM translator.
Legacy source_lang / target_lang ISO language codes for Google Translate (e.g., en, fr).
max_chars_batch Character limit for Google Translate batches.
retry_delay Seconds to wait between translation retries.
max_retries Maximum number of retry attempts when rate-limited (HTTP 429).
cache_file Path to the JSON cache used for line-level translation reuse.
Hybrid refinement_protocol_file Path to the arbitration system prompt (default: configs/refinement_protocol.txt).
Context technical_dictionary Key-value pairs of terms to ensure consistent translation.

πŸ€– Expanding LLM Providers

The architecture is designed to be Provider-Agnostic. While CopilotUIProvider is the default, you can easily switch to other models.

1. Switching to other Web LLMs

The current UI Automation logic targets window titles and UI elements. You can create new providers for ChatGPT, Claude, or Gemini by inheriting from the LLMProvider base class and adapting the pywinauto selectors to target those browsers.

2. Using a Local LLM (LM Studio / Ollama)

For total privacy and no UI interaction, you can implement a local provider. Local LLMs usually provide an OpenAI-compatible API.

Example of a Local Provider (src/modules/providers/local_llm.py):

import requests
from .base_provider import LLMProvider

class LocalLLMProvider(LLMProvider):
    def __init__(self, api_url="http://localhost:1234/v1/chat/completions"):
        self.api_url = api_url

  def ask(self, content: str, prompt: str) -> str:
        response = requests.post(self.api_url, json={
      "messages": [
        {"role": "system", "content": content},
        {"role": "user", "content": prompt}
      ],
            "temperature": 0.1
        })
        return response.json()['choices'][0]['message']['content']

3. Integration

To use your new provider, simply update the factory in main.py:

# Instead of CopilotUIProvider
provider = LocalLLMProvider(api_url="http://localhost:11434/v1/chat/completions")

⚠️ Important Notes

  • UI Focus: When using the LLM engine, the script interacts with your browser. Ensure the target window is visible and avoid manual input during the automated paste/send cycles.
  • UTF-8 Standard: All files are processed using UTF-8 encoding. If you encounter character issues on Windows, the script includes a "replace" safety mechanism to prevent crashes.
  • Data Integrity: The internals/ folder preserves intermediate files. If a translation is interrupted, you can resume without losing transcription progress.

πŸš€ Future Improvements

The project is designed with a modular architecture, allowing for several high-impact evolutions:

1. Local Voice Synthesis (TTS)

  • Feature: Generate a localized audio track directly from the translated SRT files.
  • Tech: Integration of local TTS engines like Coqui TTS or Piper to produce high-quality, natural-sounding voiceovers without cloud costs.
  • Goal: Create fully dubbed videos automatically.

2. Advanced Validation & Feedback Loop

  • Structural Integrity: Add a post-translation validation layer to ensure the LLM output strictly matches the source timestamps and block counts.
  • Auto-Correction: If a mismatch is detected (e.g., the LLM merged two blocks or skipped a timestamp), the system could automatically re-send only the failed chunk.
  • Consistency Check: Verify that the number of lines in the translated block matches the original to avoid subtitle desynchronization.

3. UI Automation Robustness

  • Headless Support: Explore playwright or selenium-wire to interact with LLM web interfaces in the background, reducing the need for an active window.
  • Dynamic Selectors: Implement smarter UI element detection in pywinauto to make the providers more resilient to browser updates or UI changes on Bing/Copilot.
  • Multi-Provider Failover: Automatically switch from one LLM provider (e.g., Copilot) to another (e.g., a local Llama 3 instance) if a rate limit or error is detected.

4. Advanced Prompt Engineering

  • Few-Shot Prompting: Include example translation pairs (source -> target) directly in the system prompt to improve the "style" and "tone" of the translation.
  • Dynamic Context Injection: Automatically inject relevant parts of the technical dictionary based on the words detected in the current chunk.
  • Chain-of-Thought: Force the model to explain its choice for difficult technical terms before providing the final SRT block to increase accuracy.

5. User Interface (GUI)

  • Dashboard: A web-based dashboard (using Streamlit or Flask) to monitor transcription progress in real-time, edit the technical dictionary on the fly, and manually correct specific translation blocks.

About

High-precision video subtitle translation pipeline . Optimized for structural integrity and semantic fluidity.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages