Convert and merge .srt subtitle files into clean plain text.
srt-tools strips the cue indices and --> timestamp lines from SubRip
(.srt) files and keeps only the spoken text. Use it to turn a single subtitle
file into a transcript, or to merge a whole folder of subtitles into one text
file.
- Zero dependencies — pure Python standard library.
- Cross-platform — Windows, macOS, Linux; Python 3.9+.
- Scriptable — driven entirely by command-line arguments, no interactive prompts, never writes to a hard-coded location.
- Encoding-tolerant — tries UTF-8, then Latin-1, then Windows-1252 so real-world subtitle files just work.
Subtitle files are full of cue numbers and --> timestamps that get in the way
when all you want is the spoken text. I kept hand-cleaning .srt files into
transcripts, so srt-tools does exactly that — and nothing else: no
dependencies, no config, no surprises.
From the project directory:
pip install .Or, for development, an editable install:
pip install -e .This installs the srt-tools command. You can also run it without installing:
python -m srt_tools.cli --help# Print the transcript to stdout
srt-tools to-text movie.srt
# Write the transcript to a file
srt-tools to-text movie.srt --out movie.txt# All .srt files directly inside ./subs, merged into one transcript
srt-tools merge-folder ./subs --out all.txt
# Include sub-folders (recursive)
srt-tools merge-folder ./subs --out all.txt --recursiveBy default each file is read as UTF-8, then Latin-1, then Windows-1252. Override
the order with one or more --encoding flags:
srt-tools to-text movie.srt --encoding utf-8 --encoding cp1252from srt_tools import srt_to_text, merge_srt_folder
text = srt_to_text(open("movie.srt", encoding="utf-8").read())
merged = merge_srt_folder("./subs", recursive=True)Other public helpers: convert_srt_file, find_srt_files,
read_text_with_fallback, srt_to_text_lines, write_text.
A .srt cue looks like:
1
00:00:01,000 --> 00:00:04,000
Hello world
srt-tools drops the index line (1), the timestamp line (the one containing
-->), and blank lines, keeping only Hello world.
Run the test suite (standard-library unittest, no network, no dependencies):
python -m unittest discover -s testsSmall, honest next steps:
- An optional
--keep-timestampsmode for workflows that want them. .srt→ WebVTT (.vtt) output.- Optional stripping of inline HTML tags and speaker labels.
MIT — see LICENSE.