A powerful command-line tool that transforms video files into professional-quality SRT subtitles using OpenAI's Whisper API, with automatic language detection and translation capabilities.
-
Smart Subtitle Processing:
- Automatic language detection and translation
- Intelligent timestamp adjustment based on text length
- Content-aware overlap resolution
- Long subtitle splitting for optimal readability
- Short subtitle extension for YouTube compatibility
- Automatic removal of [inaudible] markers
-
Efficient Processing:
- Audio compression and chunking for large files
- Parallel processing for faster transcription
- Retry logic for API reliability
-
User-Friendly:
- Interactive CLI menu with intuitive prompts
- Smart default filename suggestions based on input file
- Clear settings summary and confirmation
- Detailed console output for monitoring
- Comprehensive subtitle validation
- Python 3.7+
- ffmpeg (for video to audio conversion)
- OpenAI API key
- Required Python packages:
pip install -r requirements.txt
- Clone this repository or download the script
- Install the required dependencies:
pip install -r requirements.txt - Install ffmpeg:
- On macOS:
brew install ffmpeg - On Ubuntu:
sudo apt-get install ffmpeg - On Windows: Download from ffmpeg.org
- On macOS:
- Set up your OpenAI API key:
- Create a
.envfile in the same directory as the script - Add your OpenAI API key to the
.envfile:OPENAI_API_KEY='your_actual_api_key_here' - You can get an API key from OpenAI's platform
- Create a
The easiest way to use the tool is with the interactive CLI menu:
python -m components.srt_generator
This will guide you through the process with intuitive prompts:
- Input file path (quotes are automatically removed)
- Target language selection from a menu
- Output directory (defaults to
output_srt_files) - Output filename (automatically suggested based on input filename)
- Max subtitle length
- Subtitle splitting and timestamp checking options
- Summary of all settings with confirmation
python -m components.srt_generator input_file --target-language target_lang
Transcribe audio to English subtitles:
python -m components.srt_generator video.mp4 --target-language en
Transcribe audio to Spanish subtitles:
python -m components.srt_generator video.mp4 --target-language es
You can use quotes for paths with special characters:
python -m components.srt_generator '/path/to/your/video with #special characters.mp4' --target-language es
python -m components.srt_generator video.mp4 --target-language es --output-dir /path/to/custom/directory
The tool automatically detects the language of the audio content. For mixed language content:
- The tool will automatically detect the dominant language
- Choose your desired
--target-languagefor the output subtitles - All content will be translated to the target language
This approach ensures consistent translation without language switching in the output.
Example for mixed language audio:
python -m components.srt_generator mixed_audio.mp4 --target-language es
The project has been refactored into a modular structure for better maintainability:
components/- Main package directorysrt_generator.py- Main entry pointaudio_processor.py- Audio conversion and chunkingtranscriber.py- Audio transcription with Whisper APItranslator.py- Text translation with GPT-3.5 Turbosubtitle_processor.py- SRT processing and manipulationvalidator.py- SRT validation and statisticsargument_handler.py- Command-line and interactive menu handlingutils.py- Shared utilities and constants
This modular approach makes the code easier to maintain, test, and extend.
- Audio Extraction: Converts video to audio using ffmpeg
- Transcription with Auto-detection: Uses OpenAI's Whisper API to automatically detect and transcribe audio
- Translation: Translates text to the target language using GPT-3.5 Turbo with strict output formatting
- Timestamp Adjustment: Adjusts subtitle durations based on text length
- Long Subtitle Splitting: Intelligently splits long subtitles at natural language boundaries
- Overlap Resolution: Intelligently fixes overlapping subtitles based on content length and word count
- Short Subtitle Extension: Extends subtitles that are too short for comfortable reading
- Validation: Checks for overlapping subtitles and other issues
For files larger than 25 MB (OpenAI's limit):
- Audio is compressed to 64k mono MP3
- If still too large, audio is split into 10-minute chunks
- Each chunk is processed in parallel
- Results are merged with proper timestamp adjustments
The script includes a sophisticated algorithm to fix overlapping subtitles:
- Detects overlaps between adjacent subtitles
- Analyzes content length and word count to make intelligent adjustments
- Prioritizes longer/more complex subtitles when resolving conflicts
- Maintains minimum duration requirements for readability
- Provides detailed console output of all changes made
The script automatically splits long subtitles for improved readability:
- Calculates a threshold based on mean subtitle length and standard deviation
- Identifies subtitles exceeding this threshold
- Splits long subtitles at natural language boundaries (sentences, clauses, or words)
- Distributes timing proportionally based on text length
- Maintains proper chronological order and subtitle numbering
This feature significantly improves subtitle readability by breaking long blocks of text into manageable segments while preserving the original meaning and timing.
The script intelligently extends subtitles that are too short for comfortable reading:
- Calculates a minimum duration threshold based on mean duration and standard deviation (at least 500ms)
- Identifies subtitles shorter than this threshold
- Analyzes available space before and after each short subtitle
- Intelligently extends duration by:
- Shifting start time earlier when space is available
- Extending end time when space is available
- Distributing time adjustments proportionally when space exists in both directions
- Removing empty subtitles to create more space when needed
- Ensures all subtitles have sufficient duration for viewers to read
This feature significantly improves subtitle readability and YouTube compatibility by ensuring no subtitle appears too briefly on screen. YouTube often rejects SRT files with extremely short subtitles (under 500ms), and this feature automatically fixes such issues.
The script automatically removes [inaudible] markers from subtitles:
- Completely removes standalone [inaudible] subtitles (preserving empty timestamps)
- Removes [inaudible] markers from within mixed content subtitles
- Maintains all subtitle timing and numbering
- Creates cleaner, more readable subtitles for viewers
This feature ensures that viewers aren't distracted by [inaudible] markers while watching videos, while still preserving the proper timing and structure of the subtitles.
The script supports all languages supported by OpenAI's Whisper model. Common language codes:
- English:
en - Spanish:
es - French:
fr - German:
de - Japanese:
ja - Chinese:
zh - Russian:
ru - Portuguese:
pt - Italian:
it
- Language detection is automatic - no need to specify the source language
- The tool will always translate to the specified target language
- The maximum audio file size supported by OpenAI's API is 25 MB (handled automatically)
- The quality of transcription depends on the audio quality and the Whisper model's capabilities
- Translation is strictly formatted to return only translations or audio cues in brackets
- Inaudible or noise sections are marked with [inaudible] during processing but removed in the final output
- Whisper API: $0.006 per minute (or $0.36 per hour)
- GPT-3.5 Turbo (for translation): ~$0.01-0.02 for an hour of speech
- Total cost per hour: Approximately $0.37-0.38 per hour of processed audio/video
If your primary concern is making your content accessible to a global audience with minimal effort:
- Use this tool to generate a clean, consistent English SRT file (especially if your audio has mixed languages)
- Upload this English SRT to YouTube
- Let viewers use YouTube's auto-translate feature as needed
This approach balances your workload with accessibility, while still addressing concerns about mixed language content in the original SRT file.
When you upload an English SRT file to YouTube:
- Viewers can click on the settings (gear icon) in the player
- Select "Subtitles/CC"
- Choose "Auto-translate"
- Select their preferred language from the list
This provides accessibility in all languages YouTube supports without requiring you to create multiple SRT files.
The script provides detailed console output to help troubleshoot any issues:
- Timestamp Fixes: Displays all adjustments made to fix overlapping subtitles
- Subtitle Splitting: Shows statistics on long subtitles that were split
- Console Output: Provides real-time progress updates and statistics on overlaps fixed and [inaudible] markers removed
These logs are particularly useful for identifying and resolving issues with subtitle timing, overlaps, and readability.
This project is licensed under the MIT License - see below for details:
MIT License
Copyright (c) 2024
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.