Skip to content

RAmuSelo/srt-tools

Repository files navigation

srt-tools

Tests

Convert and merge .srt subtitle files into clean plain text.

srt-tools strips the cue indices and --> timestamp lines from SubRip (.srt) files and keeps only the spoken text. Use it to turn a single subtitle file into a transcript, or to merge a whole folder of subtitles into one text file.

  • Zero dependencies — pure Python standard library.
  • Cross-platform — Windows, macOS, Linux; Python 3.9+.
  • Scriptable — driven entirely by command-line arguments, no interactive prompts, never writes to a hard-coded location.
  • Encoding-tolerant — tries UTF-8, then Latin-1, then Windows-1252 so real-world subtitle files just work.

Why this exists

Subtitle files are full of cue numbers and --> timestamps that get in the way when all you want is the spoken text. I kept hand-cleaning .srt files into transcripts, so srt-tools does exactly that — and nothing else: no dependencies, no config, no surprises.

Install

From the project directory:

pip install .

Or, for development, an editable install:

pip install -e .

This installs the srt-tools command. You can also run it without installing:

python -m srt_tools.cli --help

Usage

Convert one file to text

# Print the transcript to stdout
srt-tools to-text movie.srt

# Write the transcript to a file
srt-tools to-text movie.srt --out movie.txt

Merge a folder of subtitles into one text file

# All .srt files directly inside ./subs, merged into one transcript
srt-tools merge-folder ./subs --out all.txt

# Include sub-folders (recursive)
srt-tools merge-folder ./subs --out all.txt --recursive

Choosing input encodings

By default each file is read as UTF-8, then Latin-1, then Windows-1252. Override the order with one or more --encoding flags:

srt-tools to-text movie.srt --encoding utf-8 --encoding cp1252

Use as a library

from srt_tools import srt_to_text, merge_srt_folder

text = srt_to_text(open("movie.srt", encoding="utf-8").read())

merged = merge_srt_folder("./subs", recursive=True)

Other public helpers: convert_srt_file, find_srt_files, read_text_with_fallback, srt_to_text_lines, write_text.

How it works

A .srt cue looks like:

1
00:00:01,000 --> 00:00:04,000
Hello world

srt-tools drops the index line (1), the timestamp line (the one containing -->), and blank lines, keeping only Hello world.

Development

Run the test suite (standard-library unittest, no network, no dependencies):

python -m unittest discover -s tests

Roadmap

Small, honest next steps:

  • An optional --keep-timestamps mode for workflows that want them.
  • .srt → WebVTT (.vtt) output.
  • Optional stripping of inline HTML tags and speaker labels.

License

MIT — see LICENSE.

About

Small Python CLI tools to convert, merge, and clean SRT subtitle files.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages