Skip to content

andresreibel/srtGenerator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Video to SRT Generator

A powerful command-line tool that transforms video files into professional-quality SRT subtitles using OpenAI's Whisper API, with automatic language detection and translation capabilities.

Features

  • Smart Subtitle Processing:

    • Automatic language detection and translation
    • Intelligent timestamp adjustment based on text length
    • Content-aware overlap resolution
    • Long subtitle splitting for optimal readability
    • Short subtitle extension for YouTube compatibility
    • Automatic removal of [inaudible] markers
  • Efficient Processing:

    • Audio compression and chunking for large files
    • Parallel processing for faster transcription
    • Retry logic for API reliability
  • User-Friendly:

    • Interactive CLI menu with intuitive prompts
    • Smart default filename suggestions based on input file
    • Clear settings summary and confirmation
    • Detailed console output for monitoring
    • Comprehensive subtitle validation

Prerequisites

  • Python 3.7+
  • ffmpeg (for video to audio conversion)
  • OpenAI API key
  • Required Python packages:
    pip install -r requirements.txt
    

Installation

  1. Clone this repository or download the script
  2. Install the required dependencies:
    pip install -r requirements.txt
    
  3. Install ffmpeg:
    • On macOS: brew install ffmpeg
    • On Ubuntu: sudo apt-get install ffmpeg
    • On Windows: Download from ffmpeg.org
  4. Set up your OpenAI API key:
    • Create a .env file in the same directory as the script
    • Add your OpenAI API key to the .env file:
      OPENAI_API_KEY='your_actual_api_key_here'
      
    • You can get an API key from OpenAI's platform

Usage

Interactive CLI Menu (Recommended)

The easiest way to use the tool is with the interactive CLI menu:

python -m components.srt_generator

This will guide you through the process with intuitive prompts:

  • Input file path (quotes are automatically removed)
  • Target language selection from a menu
  • Output directory (defaults to output_srt_files)
  • Output filename (automatically suggested based on input filename)
  • Max subtitle length
  • Subtitle splitting and timestamp checking options
  • Summary of all settings with confirmation

Command Line Arguments

Basic Usage

python -m components.srt_generator input_file --target-language target_lang

Examples

Transcribe audio to English subtitles:

python -m components.srt_generator video.mp4 --target-language en

Transcribe audio to Spanish subtitles:

python -m components.srt_generator video.mp4 --target-language es

Using Paths with Special Characters

You can use quotes for paths with special characters:

python -m components.srt_generator '/path/to/your/video with #special characters.mp4' --target-language es

Custom Output Directory

python -m components.srt_generator video.mp4 --target-language es --output-dir /path/to/custom/directory

Handling Mixed Language Content

The tool automatically detects the language of the audio content. For mixed language content:

  1. The tool will automatically detect the dominant language
  2. Choose your desired --target-language for the output subtitles
  3. All content will be translated to the target language

This approach ensures consistent translation without language switching in the output.

Example for mixed language audio:

python -m components.srt_generator mixed_audio.mp4 --target-language es

Code Structure

The project has been refactored into a modular structure for better maintainability:

  • components/ - Main package directory
    • srt_generator.py - Main entry point
    • audio_processor.py - Audio conversion and chunking
    • transcriber.py - Audio transcription with Whisper API
    • translator.py - Text translation with GPT-3.5 Turbo
    • subtitle_processor.py - SRT processing and manipulation
    • validator.py - SRT validation and statistics
    • argument_handler.py - Command-line and interactive menu handling
    • utils.py - Shared utilities and constants

This modular approach makes the code easier to maintain, test, and extend.

How It Works

Translation Process

  1. Audio Extraction: Converts video to audio using ffmpeg
  2. Transcription with Auto-detection: Uses OpenAI's Whisper API to automatically detect and transcribe audio
  3. Translation: Translates text to the target language using GPT-3.5 Turbo with strict output formatting
  4. Timestamp Adjustment: Adjusts subtitle durations based on text length
  5. Long Subtitle Splitting: Intelligently splits long subtitles at natural language boundaries
  6. Overlap Resolution: Intelligently fixes overlapping subtitles based on content length and word count
  7. Short Subtitle Extension: Extends subtitles that are too short for comfortable reading
  8. Validation: Checks for overlapping subtitles and other issues

Large File Handling

For files larger than 25 MB (OpenAI's limit):

  1. Audio is compressed to 64k mono MP3
  2. If still too large, audio is split into 10-minute chunks
  3. Each chunk is processed in parallel
  4. Results are merged with proper timestamp adjustments

Overlap Resolution

The script includes a sophisticated algorithm to fix overlapping subtitles:

  1. Detects overlaps between adjacent subtitles
  2. Analyzes content length and word count to make intelligent adjustments
  3. Prioritizes longer/more complex subtitles when resolving conflicts
  4. Maintains minimum duration requirements for readability
  5. Provides detailed console output of all changes made

Long Subtitle Splitting

The script automatically splits long subtitles for improved readability:

  1. Calculates a threshold based on mean subtitle length and standard deviation
  2. Identifies subtitles exceeding this threshold
  3. Splits long subtitles at natural language boundaries (sentences, clauses, or words)
  4. Distributes timing proportionally based on text length
  5. Maintains proper chronological order and subtitle numbering

This feature significantly improves subtitle readability by breaking long blocks of text into manageable segments while preserving the original meaning and timing.

Short Subtitle Extension

The script intelligently extends subtitles that are too short for comfortable reading:

  1. Calculates a minimum duration threshold based on mean duration and standard deviation (at least 500ms)
  2. Identifies subtitles shorter than this threshold
  3. Analyzes available space before and after each short subtitle
  4. Intelligently extends duration by:
    • Shifting start time earlier when space is available
    • Extending end time when space is available
    • Distributing time adjustments proportionally when space exists in both directions
    • Removing empty subtitles to create more space when needed
  5. Ensures all subtitles have sufficient duration for viewers to read

This feature significantly improves subtitle readability and YouTube compatibility by ensuring no subtitle appears too briefly on screen. YouTube often rejects SRT files with extremely short subtitles (under 500ms), and this feature automatically fixes such issues.

Inaudible Marker Removal

The script automatically removes [inaudible] markers from subtitles:

  1. Completely removes standalone [inaudible] subtitles (preserving empty timestamps)
  2. Removes [inaudible] markers from within mixed content subtitles
  3. Maintains all subtitle timing and numbering
  4. Creates cleaner, more readable subtitles for viewers

This feature ensures that viewers aren't distracted by [inaudible] markers while watching videos, while still preserving the proper timing and structure of the subtitles.

Supported Languages

The script supports all languages supported by OpenAI's Whisper model. Common language codes:

  • English: en
  • Spanish: es
  • French: fr
  • German: de
  • Japanese: ja
  • Chinese: zh
  • Russian: ru
  • Portuguese: pt
  • Italian: it

Important Notes

  • Language detection is automatic - no need to specify the source language
  • The tool will always translate to the specified target language
  • The maximum audio file size supported by OpenAI's API is 25 MB (handled automatically)
  • The quality of transcription depends on the audio quality and the Whisper model's capabilities
  • Translation is strictly formatted to return only translations or audio cues in brackets
  • Inaudible or noise sections are marked with [inaudible] during processing but removed in the final output

Cost Considerations

  • Whisper API: $0.006 per minute (or $0.36 per hour)
  • GPT-3.5 Turbo (for translation): ~$0.01-0.02 for an hour of speech
  • Total cost per hour: Approximately $0.37-0.38 per hour of processed audio/video

YouTube Integration

Recommendation for YouTube Uploads

If your primary concern is making your content accessible to a global audience with minimal effort:

  1. Use this tool to generate a clean, consistent English SRT file (especially if your audio has mixed languages)
  2. Upload this English SRT to YouTube
  3. Let viewers use YouTube's auto-translate feature as needed

This approach balances your workload with accessibility, while still addressing concerns about mixed language content in the original SRT file.

How YouTube Translation Works for Viewers

When you upload an English SRT file to YouTube:

  • Viewers can click on the settings (gear icon) in the player
  • Select "Subtitles/CC"
  • Choose "Auto-translate"
  • Select their preferred language from the list

This provides accessibility in all languages YouTube supports without requiring you to create multiple SRT files.

Logging and Debugging

The script provides detailed console output to help troubleshoot any issues:

  • Timestamp Fixes: Displays all adjustments made to fix overlapping subtitles
  • Subtitle Splitting: Shows statistics on long subtitles that were split
  • Console Output: Provides real-time progress updates and statistics on overlaps fixed and [inaudible] markers removed

These logs are particularly useful for identifying and resolving issues with subtitle timing, overlaps, and readability.

License

This project is licensed under the MIT License - see below for details:

MIT License

Copyright (c) 2024

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

About

Video to srt with translation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages