Video to SRT Generator

A powerful command-line tool that transforms video files into professional-quality SRT subtitles using OpenAI's Whisper API, with automatic language detection and translation capabilities.

Features

Smart Subtitle Processing:
- Automatic language detection and translation
- Intelligent timestamp adjustment based on text length
- Content-aware overlap resolution
- Long subtitle splitting for optimal readability
- Short subtitle extension for YouTube compatibility
- Automatic removal of [inaudible] markers
Efficient Processing:
- Audio compression and chunking for large files
- Parallel processing for faster transcription
- Retry logic for API reliability
User-Friendly:
- Interactive CLI menu with intuitive prompts
- Smart default filename suggestions based on input file
- Clear settings summary and confirmation
- Detailed console output for monitoring
- Comprehensive subtitle validation

Prerequisites

Python 3.7+
ffmpeg (for video to audio conversion)
OpenAI API key
Required Python packages:
```
pip install -r requirements.txt
```

Installation

Clone this repository or download the script
Install the required dependencies:
```
pip install -r requirements.txt
```
Install ffmpeg:
- On macOS: brew install ffmpeg
- On Ubuntu: sudo apt-get install ffmpeg
- On Windows: Download from ffmpeg.org
Set up your OpenAI API key:
- Create a .env file in the same directory as the script
- Add your OpenAI API key to the .env file:
```
OPENAI_API_KEY='your_actual_api_key_here'
```
- You can get an API key from OpenAI's platform

Usage

Interactive CLI Menu (Recommended)

The easiest way to use the tool is with the interactive CLI menu:

python -m components.srt_generator

This will guide you through the process with intuitive prompts:

Input file path (quotes are automatically removed)
Target language selection from a menu
Output directory (defaults to output_srt_files)
Output filename (automatically suggested based on input filename)
Max subtitle length
Subtitle splitting and timestamp checking options
Summary of all settings with confirmation

Command Line Arguments

Basic Usage

python -m components.srt_generator input_file --target-language target_lang

Examples

Transcribe audio to English subtitles:

python -m components.srt_generator video.mp4 --target-language en

Transcribe audio to Spanish subtitles:

python -m components.srt_generator video.mp4 --target-language es

Using Paths with Special Characters

You can use quotes for paths with special characters:

python -m components.srt_generator '/path/to/your/video with #special characters.mp4' --target-language es

Custom Output Directory

python -m components.srt_generator video.mp4 --target-language es --output-dir /path/to/custom/directory

Handling Mixed Language Content

The tool automatically detects the language of the audio content. For mixed language content:

The tool will automatically detect the dominant language
Choose your desired --target-language for the output subtitles
All content will be translated to the target language

This approach ensures consistent translation without language switching in the output.

Example for mixed language audio:

python -m components.srt_generator mixed_audio.mp4 --target-language es

Code Structure

The project has been refactored into a modular structure for better maintainability:

components/ - Main package directory
- srt_generator.py - Main entry point
- audio_processor.py - Audio conversion and chunking
- transcriber.py - Audio transcription with Whisper API
- translator.py - Text translation with GPT-3.5 Turbo
- subtitle_processor.py - SRT processing and manipulation
- validator.py - SRT validation and statistics
- argument_handler.py - Command-line and interactive menu handling
- utils.py - Shared utilities and constants

This modular approach makes the code easier to maintain, test, and extend.

How It Works

Translation Process

Audio Extraction: Converts video to audio using ffmpeg
Transcription with Auto-detection: Uses OpenAI's Whisper API to automatically detect and transcribe audio
Translation: Translates text to the target language using GPT-3.5 Turbo with strict output formatting
Timestamp Adjustment: Adjusts subtitle durations based on text length
Long Subtitle Splitting: Intelligently splits long subtitles at natural language boundaries
Overlap Resolution: Intelligently fixes overlapping subtitles based on content length and word count
Short Subtitle Extension: Extends subtitles that are too short for comfortable reading
Validation: Checks for overlapping subtitles and other issues

Large File Handling

For files larger than 25 MB (OpenAI's limit):

Audio is compressed to 64k mono MP3
If still too large, audio is split into 10-minute chunks
Each chunk is processed in parallel
Results are merged with proper timestamp adjustments

Overlap Resolution

The script includes a sophisticated algorithm to fix overlapping subtitles:

Detects overlaps between adjacent subtitles
Analyzes content length and word count to make intelligent adjustments
Prioritizes longer/more complex subtitles when resolving conflicts
Maintains minimum duration requirements for readability
Provides detailed console output of all changes made

Long Subtitle Splitting

The script automatically splits long subtitles for improved readability:

Calculates a threshold based on mean subtitle length and standard deviation
Identifies subtitles exceeding this threshold
Splits long subtitles at natural language boundaries (sentences, clauses, or words)
Distributes timing proportionally based on text length
Maintains proper chronological order and subtitle numbering

This feature significantly improves subtitle readability by breaking long blocks of text into manageable segments while preserving the original meaning and timing.

Short Subtitle Extension

The script intelligently extends subtitles that are too short for comfortable reading:

Calculates a minimum duration threshold based on mean duration and standard deviation (at least 500ms)
Identifies subtitles shorter than this threshold
Analyzes available space before and after each short subtitle
Intelligently extends duration by:
- Shifting start time earlier when space is available
- Extending end time when space is available
- Distributing time adjustments proportionally when space exists in both directions
- Removing empty subtitles to create more space when needed
Ensures all subtitles have sufficient duration for viewers to read

This feature significantly improves subtitle readability and YouTube compatibility by ensuring no subtitle appears too briefly on screen. YouTube often rejects SRT files with extremely short subtitles (under 500ms), and this feature automatically fixes such issues.

Inaudible Marker Removal

The script automatically removes [inaudible] markers from subtitles:

Completely removes standalone [inaudible] subtitles (preserving empty timestamps)
Removes [inaudible] markers from within mixed content subtitles
Maintains all subtitle timing and numbering
Creates cleaner, more readable subtitles for viewers

This feature ensures that viewers aren't distracted by [inaudible] markers while watching videos, while still preserving the proper timing and structure of the subtitles.

Supported Languages

The script supports all languages supported by OpenAI's Whisper model. Common language codes:

English: en
Spanish: es
French: fr
German: de
Japanese: ja
Chinese: zh
Russian: ru
Portuguese: pt
Italian: it

Important Notes

Language detection is automatic - no need to specify the source language
The tool will always translate to the specified target language
The maximum audio file size supported by OpenAI's API is 25 MB (handled automatically)
The quality of transcription depends on the audio quality and the Whisper model's capabilities
Translation is strictly formatted to return only translations or audio cues in brackets
Inaudible or noise sections are marked with [inaudible] during processing but removed in the final output

Cost Considerations

Whisper API: $0.006 per minute (or $0.36 per hour)
GPT-3.5 Turbo (for translation): ~$0.01-0.02 for an hour of speech
Total cost per hour: Approximately $0.37-0.38 per hour of processed audio/video

YouTube Integration

Recommendation for YouTube Uploads

If your primary concern is making your content accessible to a global audience with minimal effort:

Use this tool to generate a clean, consistent English SRT file (especially if your audio has mixed languages)
Upload this English SRT to YouTube
Let viewers use YouTube's auto-translate feature as needed

This approach balances your workload with accessibility, while still addressing concerns about mixed language content in the original SRT file.

How YouTube Translation Works for Viewers

When you upload an English SRT file to YouTube:

Viewers can click on the settings (gear icon) in the player
Select "Subtitles/CC"
Choose "Auto-translate"
Select their preferred language from the list

This provides accessibility in all languages YouTube supports without requiring you to create multiple SRT files.

Logging and Debugging

The script provides detailed console output to help troubleshoot any issues:

Timestamp Fixes: Displays all adjustments made to fix overlapping subtitles
Subtitle Splitting: Shows statistics on long subtitles that were split
Console Output: Provides real-time progress updates and statistics on overlaps fixed and [inaudible] markers removed

These logs are particularly useful for identifying and resolving issues with subtitle timing, overlaps, and readability.

License

This project is licensed under the MIT License - see below for details:

MIT License

Copyright (c) 2024

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
components		components
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Video to SRT Generator

Features

Prerequisites

Installation

Usage

Interactive CLI Menu (Recommended)

Command Line Arguments

Basic Usage

Examples

Using Paths with Special Characters

Custom Output Directory

Handling Mixed Language Content

Code Structure

How It Works

Translation Process

Large File Handling

Overlap Resolution

Long Subtitle Splitting

Short Subtitle Extension

Inaudible Marker Removal

Supported Languages

Important Notes

Cost Considerations

YouTube Integration

Recommendation for YouTube Uploads

How YouTube Translation Works for Viewers

Logging and Debugging

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages