Skip to content

(retriever) ASR refactor#1658

Draft
charlesbluca wants to merge 2 commits intoNVIDIA:mainfrom
charlesbluca:asr-refactor
Draft

(retriever) ASR refactor#1658
charlesbluca wants to merge 2 commits intoNVIDIA:mainfrom
charlesbluca:asr-refactor

Conversation

@charlesbluca
Copy link
Collaborator

@charlesbluca charlesbluca commented Mar 19, 2026

Description

Refactors ASR so all “chunk rows → transcript rows” logic goes through asr_chunks_to_text, with ASRActor as a thin wrapper.

What changed

  • asr_chunks_to_text(batch_df, model=..., client=..., asr_params=...)
    Single batch entry point for ASR. ASRActor only constructs model/client from ASRParams and delegates here.

  • Injectable model / client
    Inprocess and the GPU pool can pass a ParakeetCTC1B1ASR (or remote client) so the same code path runs inside and outside Ray map_batches.

  • Long audio (local Parakeet)
    ParakeetCTC1B1ASR splits inputs that exceed the model length budget and concatenates transcripts.

  • Media probing
    media_interface: more robust ffprobe handling when duration or bit_rate is missing (e.g. VBR / bad probes).

  • Inprocess
    _load_doc_to_df / _iter_doc_chunks unify loading for pdf / html / image / audio / txt in the ingest loop.

  • API
    Removed apply_asr_to_df; use asr_chunks_to_text directly (tests updated).

Why

  • One implementation for Ray batch, inprocess, and audio CLI—easier to fix and extend.
  • Injected model supports GPU pool / avoids duplicate model setup where the caller already holds the model.
  • Preserves main behavior for remote segment_audio while keeping the refactored structure.

Testing

  • pytest nemo_retriever/tests/test_asr_actor.py

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

- Rebase onto main (includes remote segment_audio / 5c5557a)
- asr_chunks_to_text + model/client injection; _build_output_rows, _infer_remote
- Parakeet long-audio split; media_interface ffprobe robustness
- inprocess: _load_doc_to_df / _iter_doc_chunks; asr_chunks_to_text with model
- stage: asr_chunks_to_text; tests; no apply_asr_to_df

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant