srt-tools

Convert and merge .srt subtitle files into clean plain text.

srt-tools strips the cue indices and --> timestamp lines from SubRip (.srt) files and keeps only the spoken text. Use it to turn a single subtitle file into a transcript, or to merge a whole folder of subtitles into one text file.

Zero dependencies — pure Python standard library.
Cross-platform — Windows, macOS, Linux; Python 3.9+.
Scriptable — driven entirely by command-line arguments, no interactive prompts, never writes to a hard-coded location.
Encoding-tolerant — tries UTF-8, then Latin-1, then Windows-1252 so real-world subtitle files just work.

Why this exists

Subtitle files are full of cue numbers and --> timestamps that get in the way when all you want is the spoken text. I kept hand-cleaning .srt files into transcripts, so srt-tools does exactly that — and nothing else: no dependencies, no config, no surprises.

Install

From the project directory:

pip install .

Or, for development, an editable install:

pip install -e .

This installs the srt-tools command. You can also run it without installing:

python -m srt_tools.cli --help

Usage

Convert one file to text

# Print the transcript to stdout
srt-tools to-text movie.srt

# Write the transcript to a file
srt-tools to-text movie.srt --out movie.txt

Merge a folder of subtitles into one text file

# All .srt files directly inside ./subs, merged into one transcript
srt-tools merge-folder ./subs --out all.txt

# Include sub-folders (recursive)
srt-tools merge-folder ./subs --out all.txt --recursive

Choosing input encodings

By default each file is read as UTF-8, then Latin-1, then Windows-1252. Override the order with one or more --encoding flags:

srt-tools to-text movie.srt --encoding utf-8 --encoding cp1252

Use as a library

from srt_tools import srt_to_text, merge_srt_folder

text = srt_to_text(open("movie.srt", encoding="utf-8").read())

merged = merge_srt_folder("./subs", recursive=True)

Other public helpers: convert_srt_file, find_srt_files, read_text_with_fallback, srt_to_text_lines, write_text.

How it works

A .srt cue looks like:

1
00:00:01,000 --> 00:00:04,000
Hello world

srt-tools drops the index line (1), the timestamp line (the one containing -->), and blank lines, keeping only Hello world.

Development

Run the test suite (standard-library unittest, no network, no dependencies):

python -m unittest discover -s tests

Roadmap

Small, honest next steps:

An optional --keep-timestamps mode for workflows that want them.
.srt → WebVTT (.vtt) output.
Optional stripping of inline HTML tags and speaker labels.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
src/srt_tools		src/srt_tools
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
RELEASE_READINESS.md		RELEASE_READINESS.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

srt-tools

Why this exists

Install

Usage

Convert one file to text

Merge a folder of subtitles into one text file

Choosing input encodings

Use as a library

How it works

Development

Roadmap

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

srt-tools

Why this exists

Install

Usage

Convert one file to text

Merge a folder of subtitles into one text file

Choosing input encodings

Use as a library

How it works

Development

Roadmap

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages