Skip to content

Add Subtitle OCR tool with review, cleanup, and export workflows#37

Open
Sudo-Rahman wants to merge 89 commits into
mainfrom
subtitle-ocr-tool
Open

Add Subtitle OCR tool with review, cleanup, and export workflows#37
Sudo-Rahman wants to merge 89 commits into
mainfrom
subtitle-ocr-tool

Conversation

@Sudo-Rahman
Copy link
Copy Markdown
Owner

@Sudo-Rahman Sudo-Rahman commented Jun 3, 2026

Summary

This pull request adds a full Subtitle OCR workflow to MediaFlow. It introduces a new local-first tool for turning supported bitmap subtitle sources into editable text subtitles, with support for PGS subtitle tracks in media containers, standalone SUP files, and standalone VobSub IDX/SUB pairs. VobSub is handled as standalone .idx/.sub input only, not as an in-container subtitle track workflow.

At a high level, the feature lets users import supported bitmap subtitle sources, extract and decode subtitle images, run OCR locally, review the detected cues against visual previews, optionally clean OCR output with an LLM, retry work at different scopes, edit generated cue text, and export finished subtitle versions to ASS, SRT, or WebVTT.

What Changed

New Subtitle OCR tool surface

  • Adds a dedicated SubtitleOcrView and wires it into the main app navigation, sidebar, route-level tool selection, drag/drop handling, and batch export flow.
  • Adds the Subtitle OCR component set for the full workflow: import basket, options panel, source sidebar, review workspace, timeline, cue rail, cue cards, version selector, result dialog, retry dialogs, retry option fields, and import-track selection dialog.
  • Supports mixed input flows for PGS tracks inside media containers, standalone .sup files, and standalone VobSub .idx/.sub pairs.
  • Adds per-source status tracking for scanning, ready, extracting, decoding, OCR processing, AI cleanup, completed, and error states.

Review and editing workflow

  • Adds a responsive review workspace that pairs OCR cue text with decoded bitmap previews.
  • Adds cue selection, version selection, active review target tracking, readonly processing draft views, and UI state helpers for keeping the rail, cue cards, and timeline synchronized.
  • Adds timeline/filmstrip-oriented navigation for bitmap subtitle review, including viewport reporting, visible cue bucketing, and edge-case handling around cue selection and synchronization.
  • Adds direct cue text editing for the selected persisted version. updateCueText mutates finalCues on the current version, and handleCueTextCommit persists that mutation immediately, so persisted versions are reviewable/editable records rather than immutable snapshots.
  • Adds retry support for full OCR retries and AI-cleanup-only retries, including item-scoped cancellation and progress handling.

Backend Subtitle OCR pipeline

  • Adds a new Rust subtitle_ocr tool module with focused submodules for assets, cancellation, decoding, export, extraction, import, OCR, progress, preview restoration, stabilization, operation state, and text reconstruction.
  • Registers new Tauri commands for probing PGS bitmap subtitle tracks in containers, resolving standalone VobSub pairs, preparing subtitle tracks, decoding bitmaps, running the OCR pipeline, restoring missing bitmap assets, exporting subtitle versions, and cancelling active operations.
  • Adds streaming OCR processing that decodes bitmap subtitle cues, writes preview assets, runs OCR, emits live cue events, stabilizes text cues, and returns pipeline output for frontend persistence.
  • Adds operation state scoped by item and run id so progress and cancellation events are applied only to the relevant active run.
  • Adds sleep inhibition while long-running Subtitle OCR work is active.

OCR model support

  • Adds an English PP-OCR recognition model and key file under src-tauri/ocr-models.
  • Updates the shared OCR engine/model plumbing so Subtitle OCR can resolve OCR models, create OCR engines, and use GPU acceleration consistently with the existing OCR infrastructure.
  • Changes the Subtitle OCR defaults and model handling so the English OCR model can be used by default where appropriate.

Persistence and versioning

  • Adds Subtitle OCR domain types, including source snapshots, standalone VobSub pair metadata, bitmap cue metadata, raw OCR boxes, final cues, progress events, processing drafts, retry modes, versions, and per-source persistence data.
  • Adds a dedicated Subtitle OCR store with defensive cloning helpers, config state, selected item state, processing/cancellation scope, review target normalization, active version management, processing drafts, rendered cues, and log entries.
  • Adds sidecar-backed Subtitle OCR storage that saves source versions, active version selection, cue data, bitmap metadata, raw OCR output, AI cleanup state, and config snapshots.
  • Presents versions as persisted, reviewable records. They preserve OCR/retry history and can be selected for review/export, but the currently selected version remains editable: cue text commits update that version's finalCues and are saved back to the sidecar immediately.
  • Extends existing media/transcription/translation storage behavior so Subtitle OCR data can coexist with existing MediaFlow sidecar data without overwriting unrelated persisted sections.

AI cleanup

  • Adds an optional AI cleanup pass for Subtitle OCR output.
  • Builds a constrained JSON-only prompt that asks the model to correct OCR mistakes without translating, inventing content, reordering cues, or changing cue count.
  • Parses and validates cleanup responses, rejects unknown cue IDs, preserves cue timing/source IDs, removes clear OCR noise when requested by the model, and merges adjacent duplicate cues locally.
  • Supports cancellation and usage reporting through the existing LLM client flow.

Export support

  • Adds Subtitle OCR export support for ASS, SRT, and WebVTT.
  • Adds frontend preview/build helpers for export formats and a Tauri export command that validates output paths, validates cue timing, sorts cues, filters blank text, and writes the selected subtitle format.
  • Integrates Subtitle OCR versions into the app’s existing versioned batch export dialog so users can export one or more generated or edited versions across multiple sources.

Logging, progress, and cancellation

  • Adds Subtitle OCR-specific progress merging and stale-event filtering so progress remains stable across extracting, decoding, OCR, and AI cleanup phases.
  • Adds live cue events while OCR is running so the review UI can update as work completes.
  • Adds item-scoped and run-scoped cancellation support, including targeted cancellation for individual Subtitle OCR sources.
  • Adds user-visible logging and toast reporting for processing summaries, failures, cancellation, missing preview restoration, and completion states.

Tests and design documentation

  • Adds design documentation for the Subtitle OCR tool, review redesign, and filmstrip/timeline redesign.
  • Adds frontend unit coverage for Subtitle OCR types, import helpers, storage, export formatting, AI cleanup parsing/application, store behavior, progress merging, review state, sidebar state, retry dialog state, version selection, cue cards, options panel, import dialog behavior, and the main view state.
  • Adds Rust unit coverage around Subtitle OCR serialization and export formatting behavior.

User Impact

Users can now convert supported bitmap subtitle formats into editable text subtitles inside MediaFlow instead of leaving the app or relying on a server-side subtitle OCR workflow. The feature is designed around reviewability: users can see the original bitmap cues, inspect OCR text, edit generated text, retry bad results, optionally run AI cleanup, and export clean subtitle files in common formats.

This also makes Subtitle OCR fit the existing MediaFlow model: local processing first, durable per-media sidecar data, persisted reviewable versions, cancellable long-running operations, progress events, and batch export integration.

Implementation Notes

  • The frontend owns orchestration, source/version state, UI review state, cue text editing, persistence merging, AI cleanup calls, and Tauri command invocation.
  • The Rust backend owns media probing for supported PGS container tracks, standalone VobSub pair resolution, bitmap subtitle extraction/decoding, local OCR execution, preview asset restoration, export writing, progress events, cancellation state, and path validation.
  • Subtitle OCR persistence is stored alongside existing per-media MediaFlow sidecar data while preserving unrelated tool data.
  • OCR and preview processing are run with item/run identifiers so stale events from previous runs do not mutate the current review state.
  • Version records are persistent and reviewable, but not immutable. Editing cue text updates the active version's finalCues, and the commit handler persists the changed version immediately.
  • Export validation happens on both sides: the frontend filters invalid/blank cues before invoking export, and the backend validates format, path, timing, sorting, and blank cue handling before writing files.

Diff Size

  • 82 commits ahead of main.
  • 90 files changed.
  • Approximately 19,688 insertions and 61 deletions.

Validation

Automated test coverage was added across the new Subtitle OCR frontend state/services/types and Rust serialization/export helpers. I did not run the full validation suite during this PR description update.

Recommended checks before marking this ready for review:

  • pnpm check
  • pnpm test
  • cargo test --manifest-path src-tauri/Cargo.toml

Notes for Reviewers

The highest-value review areas are the Tauri command contracts, supported subtitle-source boundaries, item/run-scoped cancellation behavior, sidecar merge compatibility, live progress/event filtering, direct cue edit persistence, export format correctness, and review UI state synchronization. The PR is intentionally opened as a draft so those areas can be checked before requesting final review.

@Sudo-Rahman Sudo-Rahman marked this pull request as ready for review June 4, 2026 10:52
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 71713002ce

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/lib/services/subtitle-ocr-import.ts Outdated
Comment on lines +26 to +27
'm2ts',
'vob',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop accepting container extensions rejected by backend

When a user selects or drops a .m2ts or .vob, this list classifies it as a Subtitle OCR container, but the backend probe_subtitle_ocr_tracks path immediately calls validate_media_path; the shared allowed-extension list in src-tauri/src/shared/validation.rs does not include either extension, so these imports fail with Unsupported file type before ffprobe can inspect them. Either add these extensions to the backend validator or stop advertising them as importable here.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f90d1db683

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}
Ok(())
},
|| restore_complete.get(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle completed restores without returning an error

When the requested missing preview bitmaps are all found before the subtitle stream reaches EOF, this stop predicate flips to true; however decode_bitmap_subtitle_source_with_handler_and_stop treats any true stop condition as "Subtitle OCR bitmap decode stopped" and returns Err (as its own stop test asserts). That means restoring a small subset of previews from a longer source reports failure even after writing the assets, so the frontend skips the success path and persistence, causing the same previews to be considered missing again on the next load.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2b80536821

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +35 to +39
pub(super) fn codec_label(codec: &str) -> Option<&'static str> {
match codec.to_ascii_lowercase().as_str() {
"hdmv_pgs_subtitle" | "pgs" => Some("PGS"),
_ => None,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include VobSub streams when probing containers

For containers with embedded VobSub subtitles, ffprobe reports the subtitle codec as dvd_subtitle, but this matcher only accepts PGS names, so parse_tracks_from_probe_json drops those streams and the UI reports “No bitmap subtitle tracks found” even though the Subtitle OCR flow advertises PGS or VobSub tracks. Add the VobSub codec mapping (or otherwise surface the existing unsupported-container message) so MKV/VOB files with VobSub tracks are not silently treated as having no compatible bitmap subtitles.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a3b9b750b4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +625 to +627
fn ensure_vobsub_pair_matches(idx_path: &Path, sub_path: &Path) -> Result<(), String> {
if idx_path.with_extension("sub") == sub_path {
Ok(())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Accept VobSub sidecar names case-insensitively

When a user drops a complete pair such as Movie.IDX and Movie.SUB, the import code accepts both parts because extensions are lowercased before classification, but processing then rejects the pair here: idx_path.with_extension("sub") produces Movie.sub, which is not path-equal to the provided Movie.SUB. This makes valid uppercase VobSub pairs fail validation before decoding; compare the stem/extensions case-insensitively or resolve the actual sidecar path instead of requiring an exact lowercase .sub path.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant