Add Subtitle OCR tool with review, cleanup, and export workflows by Sudo-Rahman · Pull Request #37 · Sudo-Rahman/MediaFlow

Sudo-Rahman · 2026-06-03T14:30:06Z

Summary

This pull request adds a full Subtitle OCR workflow to MediaFlow. It introduces a new local-first tool for turning supported bitmap subtitle sources into editable text subtitles, with support for PGS subtitle tracks in media containers, standalone SUP files, and standalone VobSub IDX/SUB pairs. VobSub is handled as standalone .idx/.sub input only, not as an in-container subtitle track workflow.

At a high level, the feature lets users import supported bitmap subtitle sources, extract and decode subtitle images, run OCR locally, review the detected cues against visual previews, optionally clean OCR output with an LLM, retry work at different scopes, edit generated cue text, and export finished subtitle versions to ASS, SRT, or WebVTT.

What Changed

New Subtitle OCR tool surface

Adds a dedicated SubtitleOcrView and wires it into the main app navigation, sidebar, route-level tool selection, drag/drop handling, and batch export flow.
Adds the Subtitle OCR component set for the full workflow: import basket, options panel, source sidebar, review workspace, timeline, cue rail, cue cards, version selector, result dialog, retry dialogs, retry option fields, and import-track selection dialog.
Supports mixed input flows for PGS tracks inside media containers, standalone .sup files, and standalone VobSub .idx/.sub pairs.
Adds per-source status tracking for scanning, ready, extracting, decoding, OCR processing, AI cleanup, completed, and error states.

Review and editing workflow

Adds a responsive review workspace that pairs OCR cue text with decoded bitmap previews.
Adds cue selection, version selection, active review target tracking, readonly processing draft views, and UI state helpers for keeping the rail, cue cards, and timeline synchronized.
Adds timeline/filmstrip-oriented navigation for bitmap subtitle review, including viewport reporting, visible cue bucketing, and edge-case handling around cue selection and synchronization.
Adds direct cue text editing for the selected persisted version. updateCueText mutates finalCues on the current version, and handleCueTextCommit persists that mutation immediately, so persisted versions are reviewable/editable records rather than immutable snapshots.
Adds retry support for full OCR retries and AI-cleanup-only retries, including item-scoped cancellation and progress handling.

Backend Subtitle OCR pipeline

Adds a new Rust subtitle_ocr tool module with focused submodules for assets, cancellation, decoding, export, extraction, import, OCR, progress, preview restoration, stabilization, operation state, and text reconstruction.
Registers new Tauri commands for probing PGS bitmap subtitle tracks in containers, resolving standalone VobSub pairs, preparing subtitle tracks, decoding bitmaps, running the OCR pipeline, restoring missing bitmap assets, exporting subtitle versions, and cancelling active operations.
Adds streaming OCR processing that decodes bitmap subtitle cues, writes preview assets, runs OCR, emits live cue events, stabilizes text cues, and returns pipeline output for frontend persistence.
Adds operation state scoped by item and run id so progress and cancellation events are applied only to the relevant active run.
Adds sleep inhibition while long-running Subtitle OCR work is active.

OCR model support

Adds an English PP-OCR recognition model and key file under src-tauri/ocr-models.
Updates the shared OCR engine/model plumbing so Subtitle OCR can resolve OCR models, create OCR engines, and use GPU acceleration consistently with the existing OCR infrastructure.
Changes the Subtitle OCR defaults and model handling so the English OCR model can be used by default where appropriate.

Persistence and versioning

Adds Subtitle OCR domain types, including source snapshots, standalone VobSub pair metadata, bitmap cue metadata, raw OCR boxes, final cues, progress events, processing drafts, retry modes, versions, and per-source persistence data.
Adds a dedicated Subtitle OCR store with defensive cloning helpers, config state, selected item state, processing/cancellation scope, review target normalization, active version management, processing drafts, rendered cues, and log entries.
Adds sidecar-backed Subtitle OCR storage that saves source versions, active version selection, cue data, bitmap metadata, raw OCR output, AI cleanup state, and config snapshots.
Presents versions as persisted, reviewable records. They preserve OCR/retry history and can be selected for review/export, but the currently selected version remains editable: cue text commits update that version's finalCues and are saved back to the sidecar immediately.
Extends existing media/transcription/translation storage behavior so Subtitle OCR data can coexist with existing MediaFlow sidecar data without overwriting unrelated persisted sections.

AI cleanup

Adds an optional AI cleanup pass for Subtitle OCR output.
Builds a constrained JSON-only prompt that asks the model to correct OCR mistakes without translating, inventing content, reordering cues, or changing cue count.
Parses and validates cleanup responses, rejects unknown cue IDs, preserves cue timing/source IDs, removes clear OCR noise when requested by the model, and merges adjacent duplicate cues locally.
Supports cancellation and usage reporting through the existing LLM client flow.

Export support

Adds Subtitle OCR export support for ASS, SRT, and WebVTT.
Adds frontend preview/build helpers for export formats and a Tauri export command that validates output paths, validates cue timing, sorts cues, filters blank text, and writes the selected subtitle format.
Integrates Subtitle OCR versions into the app’s existing versioned batch export dialog so users can export one or more generated or edited versions across multiple sources.

Logging, progress, and cancellation

Adds Subtitle OCR-specific progress merging and stale-event filtering so progress remains stable across extracting, decoding, OCR, and AI cleanup phases.
Adds live cue events while OCR is running so the review UI can update as work completes.
Adds item-scoped and run-scoped cancellation support, including targeted cancellation for individual Subtitle OCR sources.
Adds user-visible logging and toast reporting for processing summaries, failures, cancellation, missing preview restoration, and completion states.

Tests and design documentation

Adds design documentation for the Subtitle OCR tool, review redesign, and filmstrip/timeline redesign.
Adds frontend unit coverage for Subtitle OCR types, import helpers, storage, export formatting, AI cleanup parsing/application, store behavior, progress merging, review state, sidebar state, retry dialog state, version selection, cue cards, options panel, import dialog behavior, and the main view state.
Adds Rust unit coverage around Subtitle OCR serialization and export formatting behavior.

User Impact

Users can now convert supported bitmap subtitle formats into editable text subtitles inside MediaFlow instead of leaving the app or relying on a server-side subtitle OCR workflow. The feature is designed around reviewability: users can see the original bitmap cues, inspect OCR text, edit generated text, retry bad results, optionally run AI cleanup, and export clean subtitle files in common formats.

This also makes Subtitle OCR fit the existing MediaFlow model: local processing first, durable per-media sidecar data, persisted reviewable versions, cancellable long-running operations, progress events, and batch export integration.

Implementation Notes

The frontend owns orchestration, source/version state, UI review state, cue text editing, persistence merging, AI cleanup calls, and Tauri command invocation.
The Rust backend owns media probing for supported PGS container tracks, standalone VobSub pair resolution, bitmap subtitle extraction/decoding, local OCR execution, preview asset restoration, export writing, progress events, cancellation state, and path validation.
Subtitle OCR persistence is stored alongside existing per-media MediaFlow sidecar data while preserving unrelated tool data.
OCR and preview processing are run with item/run identifiers so stale events from previous runs do not mutate the current review state.
Version records are persistent and reviewable, but not immutable. Editing cue text updates the active version's finalCues, and the commit handler persists the changed version immediately.
Export validation happens on both sides: the frontend filters invalid/blank cues before invoking export, and the backend validates format, path, timing, sorting, and blank cue handling before writing files.

Diff Size

82 commits ahead of main.
90 files changed.
Approximately 19,688 insertions and 61 deletions.

Validation

Automated test coverage was added across the new Subtitle OCR frontend state/services/types and Rust serialization/export helpers. I did not run the full validation suite during this PR description update.

Recommended checks before marking this ready for review:

pnpm check
pnpm test
cargo test --manifest-path src-tauri/Cargo.toml

Notes for Reviewers

The highest-value review areas are the Tauri command contracts, supported subtitle-source boundaries, item/run-scoped cancellation behavior, sidecar merge compatibility, live progress/event filtering, direct cue edit persistence, export format correctness, and review UI state synchronization. The PR is intentionally opened as a draft so those areas can be checked before requesting final review.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 71713002ce

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-04T10:57:21Z

+  'm2ts',
+  'vob',


Stop accepting container extensions rejected by backend

When a user selects or drops a .m2ts or .vob, this list classifies it as a Subtitle OCR container, but the backend probe_subtitle_ocr_tracks path immediately calls validate_media_path; the shared allowed-extension list in src-tauri/src/shared/validation.rs does not include either extension, so these imports fail with Unsupported file type before ffprobe can inspect them. Either add these extensions to the backend validator or stop advertising them as importable here.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f90d1db683

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-04T16:06:02Z

+            }
+            Ok(())
+        },
+        || restore_complete.get(),


Handle completed restores without returning an error

When the requested missing preview bitmaps are all found before the subtitle stream reaches EOF, this stop predicate flips to true; however decode_bitmap_subtitle_source_with_handler_and_stop treats any true stop condition as "Subtitle OCR bitmap decode stopped" and returns Err (as its own stop test asserts). That means restoring a small subset of previews from a longer source reports failure even after writing the assets, so the frontend skips the success path and persistence, causing the same previews to be considered missing again on the next load.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2b80536821

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-04T16:58:41Z

+pub(super) fn codec_label(codec: &str) -> Option<&'static str> {
+    match codec.to_ascii_lowercase().as_str() {
+        "hdmv_pgs_subtitle" | "pgs" => Some("PGS"),
+        _ => None,
+    }


Include VobSub streams when probing containers

For containers with embedded VobSub subtitles, ffprobe reports the subtitle codec as dvd_subtitle, but this matcher only accepts PGS names, so parse_tracks_from_probe_json drops those streams and the UI reports “No bitmap subtitle tracks found” even though the Subtitle OCR flow advertises PGS or VobSub tracks. Add the VobSub codec mapping (or otherwise surface the existing unsupported-container message) so MKV/VOB files with VobSub tracks are not silently treated as having no compatible bitmap subtitles.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a3b9b750b4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-04T17:15:13Z

+fn ensure_vobsub_pair_matches(idx_path: &Path, sub_path: &Path) -> Result<(), String> {
+    if idx_path.with_extension("sub") == sub_path {
+        Ok(())


Accept VobSub sidecar names case-insensitively

When a user drops a complete pair such as Movie.IDX and Movie.SUB, the import code accepts both parts because extensions are lowercased before classification, but processing then rejects the pair here: idx_path.with_extension("sub") produces Movie.sub, which is not path-equal to the provided Movie.SUB. This makes valid uppercase VobSub pairs fail validation before decoding; compare the stem/extensions case-insensitively or resolve the actual sidecar path instead of requiring an exact lowercase .sub path.

Useful? React with 👍 / 👎.

Sudo-Rahman added 30 commits May 28, 2026 19:29

Add Subtitle OCR design spec

cc2005c

feat: add subtitle OCR domain types

f9422a9

fix: tighten subtitle OCR source types

b80970f

feat: add subtitle OCR import helpers

221fd84

test: cover subtitle OCR container imports

de37c52

fix: harden subtitle OCR import helpers

cd4def2

fix: avoid subtitle OCR import id collisions

3c53f04

feat: add subtitle OCR store

981ca1c

fix: protect subtitle OCR store invariants

195f592

feat: add subtitle OCR persistence

46cd327

fix: preserve subtitle OCR sidecar data

a497cdb

feat: add subtitle OCR text and export core

4afd27d

fix: harden subtitle OCR export core

e306e57

feat: add subtitle OCR backend commands

b0b40cb

fix: harden subtitle OCR backend timing

206aab5

fix: stream subtitle OCR backend work

ae20b37

fix: make subtitle OCR operation state atomic

b3eafe7

feat: add subtitle OCR export flow

44a3f04

feat: add subtitle OCR import and sidebar UI

b5e84ba

fix: align subtitle OCR import dialog markup

f95008b

feat: add subtitle OCR options panel

7edeff0

feat: add subtitle OCR review workspace

35487fa

fix: keep subtitle OCR version selector visible

9e64c0c

fix: harden subtitle OCR review controls

cb591ab

feat: add subtitle OCR view

3ffd257

fix: harden subtitle OCR view shell

15fd993

feat: integrate subtitle OCR tool shell

7a73734

feat: add subtitle OCR AI cleanup

5f6b123

fix: validate subtitle OCR cleanup cue numbers

34b9818

fix: validate subtitle OCR cleanup source mapping

fb556eb

Sudo-Rahman added 18 commits May 30, 2026 18:15

Fix Subtitle OCR cancel pending state

1ee07ff

Fix Subtitle OCR preview restoration

9b2fbac

fix: count exportable subtitle ocr cues

fc0d685

Fix subtitle OCR import dialog closing

28a2b80

Document subtitle OCR filmstrip timeline redesign

bef962e

Refine subtitle OCR rail sync and timeline interactions

3be1652

Refine subtitle OCR timeline sync

e6b2490

Polish subtitle OCR timeline interactions

7744ce7

Refine subtitle OCR diagnostics and line grouping

5b0e064

Use English OCR model by default

9e134fa

Remove Subtitle OCR thumbnail assets

880dc27

Add Subtitle OCR logging and toasts

901da8a

Improve subtitle OCR live review

56bad10

Add server OCR model options

76327af

Remove server detection OCR option

cd312c0

feat: cancel subtitle ocr sources individually

d131445

ui improvement

9baf72e

Remove server recognition OCR model

7171300

Sudo-Rahman marked this pull request as ready for review June 4, 2026 10:52

chatgpt-codex-connector Bot reviewed Jun 4, 2026

View reviewed changes

Sudo-Rahman added 4 commits June 4, 2026 14:12

Add Subtitle OCR PGS ASS placement

986dea4

feat: optimize subtitle OCR bitmap processing

3deb3e5

fix subtitle OCR timeline viewport fitting

b5c5494

increase subtitle OCR filmstrip preview size

f90d1db

chatgpt-codex-connector Bot reviewed Jun 4, 2026

View reviewed changes

extract subtitle OCR compact preview scroller

2b80536

chatgpt-codex-connector Bot reviewed Jun 4, 2026

View reviewed changes

fix subtitle OCR import and preview restore

a3b9b75

chatgpt-codex-connector Bot reviewed Jun 4, 2026

View reviewed changes

fix subtitle OCR VobSub edge cases

ce24903

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Subtitle OCR tool with review, cleanup, and export workflows#37

Add Subtitle OCR tool with review, cleanup, and export workflows#37
Sudo-Rahman wants to merge 89 commits into
mainfrom
subtitle-ocr-tool

Sudo-Rahman commented Jun 3, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Sudo-Rahman commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

New Subtitle OCR tool surface

Review and editing workflow

Backend Subtitle OCR pipeline

OCR model support

Persistence and versioning

AI cleanup

Export support

Logging, progress, and cancellation

Tests and design documentation

User Impact

Implementation Notes

Diff Size

Validation

Notes for Reviewers

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Sudo-Rahman commented Jun 3, 2026 •

edited

Loading