Add Subtitle OCR tool with review, cleanup, and export workflows#37
Add Subtitle OCR tool with review, cleanup, and export workflows#37Sudo-Rahman wants to merge 89 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 71713002ce
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| 'm2ts', | ||
| 'vob', |
There was a problem hiding this comment.
Stop accepting container extensions rejected by backend
When a user selects or drops a .m2ts or .vob, this list classifies it as a Subtitle OCR container, but the backend probe_subtitle_ocr_tracks path immediately calls validate_media_path; the shared allowed-extension list in src-tauri/src/shared/validation.rs does not include either extension, so these imports fail with Unsupported file type before ffprobe can inspect them. Either add these extensions to the backend validator or stop advertising them as importable here.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f90d1db683
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
| Ok(()) | ||
| }, | ||
| || restore_complete.get(), |
There was a problem hiding this comment.
Handle completed restores without returning an error
When the requested missing preview bitmaps are all found before the subtitle stream reaches EOF, this stop predicate flips to true; however decode_bitmap_subtitle_source_with_handler_and_stop treats any true stop condition as "Subtitle OCR bitmap decode stopped" and returns Err (as its own stop test asserts). That means restoring a small subset of previews from a longer source reports failure even after writing the assets, so the frontend skips the success path and persistence, causing the same previews to be considered missing again on the next load.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2b80536821
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pub(super) fn codec_label(codec: &str) -> Option<&'static str> { | ||
| match codec.to_ascii_lowercase().as_str() { | ||
| "hdmv_pgs_subtitle" | "pgs" => Some("PGS"), | ||
| _ => None, | ||
| } |
There was a problem hiding this comment.
Include VobSub streams when probing containers
For containers with embedded VobSub subtitles, ffprobe reports the subtitle codec as dvd_subtitle, but this matcher only accepts PGS names, so parse_tracks_from_probe_json drops those streams and the UI reports “No bitmap subtitle tracks found” even though the Subtitle OCR flow advertises PGS or VobSub tracks. Add the VobSub codec mapping (or otherwise surface the existing unsupported-container message) so MKV/VOB files with VobSub tracks are not silently treated as having no compatible bitmap subtitles.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a3b9b750b4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| fn ensure_vobsub_pair_matches(idx_path: &Path, sub_path: &Path) -> Result<(), String> { | ||
| if idx_path.with_extension("sub") == sub_path { | ||
| Ok(()) |
There was a problem hiding this comment.
Accept VobSub sidecar names case-insensitively
When a user drops a complete pair such as Movie.IDX and Movie.SUB, the import code accepts both parts because extensions are lowercased before classification, but processing then rejects the pair here: idx_path.with_extension("sub") produces Movie.sub, which is not path-equal to the provided Movie.SUB. This makes valid uppercase VobSub pairs fail validation before decoding; compare the stem/extensions case-insensitively or resolve the actual sidecar path instead of requiring an exact lowercase .sub path.
Useful? React with 👍 / 👎.
Summary
This pull request adds a full Subtitle OCR workflow to MediaFlow. It introduces a new local-first tool for turning supported bitmap subtitle sources into editable text subtitles, with support for PGS subtitle tracks in media containers, standalone SUP files, and standalone VobSub IDX/SUB pairs. VobSub is handled as standalone
.idx/.subinput only, not as an in-container subtitle track workflow.At a high level, the feature lets users import supported bitmap subtitle sources, extract and decode subtitle images, run OCR locally, review the detected cues against visual previews, optionally clean OCR output with an LLM, retry work at different scopes, edit generated cue text, and export finished subtitle versions to ASS, SRT, or WebVTT.
What Changed
New Subtitle OCR tool surface
SubtitleOcrViewand wires it into the main app navigation, sidebar, route-level tool selection, drag/drop handling, and batch export flow..supfiles, and standalone VobSub.idx/.subpairs.Review and editing workflow
updateCueTextmutatesfinalCueson the current version, andhandleCueTextCommitpersists that mutation immediately, so persisted versions are reviewable/editable records rather than immutable snapshots.Backend Subtitle OCR pipeline
subtitle_ocrtool module with focused submodules for assets, cancellation, decoding, export, extraction, import, OCR, progress, preview restoration, stabilization, operation state, and text reconstruction.OCR model support
src-tauri/ocr-models.Persistence and versioning
finalCuesand are saved back to the sidecar immediately.AI cleanup
Export support
Logging, progress, and cancellation
Tests and design documentation
User Impact
Users can now convert supported bitmap subtitle formats into editable text subtitles inside MediaFlow instead of leaving the app or relying on a server-side subtitle OCR workflow. The feature is designed around reviewability: users can see the original bitmap cues, inspect OCR text, edit generated text, retry bad results, optionally run AI cleanup, and export clean subtitle files in common formats.
This also makes Subtitle OCR fit the existing MediaFlow model: local processing first, durable per-media sidecar data, persisted reviewable versions, cancellable long-running operations, progress events, and batch export integration.
Implementation Notes
finalCues, and the commit handler persists the changed version immediately.Diff Size
main.Validation
Automated test coverage was added across the new Subtitle OCR frontend state/services/types and Rust serialization/export helpers. I did not run the full validation suite during this PR description update.
Recommended checks before marking this ready for review:
pnpm checkpnpm testcargo test --manifest-path src-tauri/Cargo.tomlNotes for Reviewers
The highest-value review areas are the Tauri command contracts, supported subtitle-source boundaries, item/run-scoped cancellation behavior, sidecar merge compatibility, live progress/event filtering, direct cue edit persistence, export format correctness, and review UI state synchronization. The PR is intentionally opened as a draft so those areas can be checked before requesting final review.