Add a persistent Parakeet helper for low-latency host integrations by seyeong-han · Pull Request #18861 · pytorch/executorch

seyeong-han · 2026-04-14T00:41:18Z

Summary

Factor the Parakeet transcription core out of parakeet_runner into a shared ParakeetTranscriber class
Add a new parakeet_helper binary plus a stdin/stdout helper protocol for long-lived host integrations
Build the helper in the existing Parakeet CMake presets and document the helper workflow in the README

Why a helper?

The Voxtral Realtime macOS app (executorch-examples/voxtral_realtime/macos) didn't need any changes to the executorch repo because voxtral_realtime_runner was already designed as a streaming, long-running process — the app just launches it and feeds audio.

parakeet_runner is different: it's a one-shot batch CLI tool that loads the model, transcribes one WAV file, prints the result, and exits. There's no way to send it a second request without restarting the process and paying the ~1.4 s model-load cost again.

The ExecuWhisper macOS app (meta-pytorch/executorch-examples#232) runs repeated record-then-transcribe requests via system dictation, so a fresh process per recording is too slow. parakeet_helper fills that gap — it's the Parakeet equivalent of what the Voxtral Realtime runner already does natively: stay alive, keep the model warm, and accept multiple requests over stdin/stdout.

Test plan

cmake --preset llm-metal-stats -DEXECUTORCH_BUILD_MLX=OFF
cmake --build --preset llm-metal-stats-install
cd examples/models/parakeet && cmake --build --preset parakeet-metal -- both parakeet_runner and parakeet_helper link successfully

Made-with: Cursor

Factor the Parakeet transcription logic out of the one-shot runner so host apps can keep the model warm across requests. Build the new helper alongside the runner and document the helper workflow for app integrations. Made-with: Cursor

pytorch-bot · 2026-04-14T00:41:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18861

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 12 Pending, 4 Unrelated Failures, 1 Unclassified Failure

As of commit 2aea05e with merge base 66884b4 ():

NEW FAILURE - The following job has failed:

MLX / backend-tester (models) / test-mlx-backend-models (gh)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 1

UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:

Build Aarch64 Linux Wheels / pytorch/executorch / build-wheel-py3_10-cpu-aarch64 (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)

FLAKY - The following job failed but was likely due to flakiness present on trunk:

Build Aarch64 Linux Wheels / pytorch/executorch / upload / upload-wheel-py3_10-cpu-aarch64 (gh) (detected as infra flaky with no log or failing log classifier)

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-llama-runner-qnn-linux (fp32, qnn_16a16w, qnn) / linux-job (gh) (trunk failure)
pull / unittest / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
pull / unittest-editable / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-04-14T00:47:02Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

stdout and stderr are C standard library macros. On macOS/Clang they resolve to extern FILE* variables so using them as parameter names compiles by coincidence. On Windows/MSVC they expand to function calls like (__acrt_iob_func(1)), breaking the syntax. Rename to captured_stdout / captured_stderr. The JSON keys remain 'stdout' and 'stderr' (unchanged wire format).

Long-lived companion process for the LFM2.5-350M MLX formatter, mirroring the parakeet helper introduced in pytorch#18861. Wraps an executorch::extension::llm::TextLLMRunner with the same JSON-line stdin/stdout protocol the macOS ExecuWhisper app already uses for the parakeet ASR helper, so the formatter model can stay loaded and KV-warm across requests. Wire contract (kProtocolVersion=1): Requests: {"type":"format", "version":1, "request_id":..., "prompt":..., "max_new_tokens":..., "temperature":...} {"type":"shutdown", "version":1} Responses: {"type":"ready", "version":1} {"type":"status", "version":1, "request_id":..., "phase":..., "message":...} {"type":"result", "version":1, "request_id":..., "text":..., "stdout":..., "stderr":..., "tokens_per_second":<opt double>} {"type":"error", "version":1, "request_id":<opt>, "message":..., "details":<opt>} The Swift counterpart lives at ExecuWhisper/Services/FormatterHelperProtocol.swift in meta-llama/internal-llama-cookbook (end-to-end-use-cases/ExecuWhisper). Build via the existing make target: cd ~/executorch make lfm_2_5_formatter-mlx which produces: cmake-out/examples/models/llama/lfm25_formatter_helper cmake-out/examples/models/llama/mlx.metallib The new lfm_2_5_formatter-mlx Make target depends on the existing lfm_2_5-mlx target; the llama-mlx CMake build preset's targets list now includes lfm25_formatter_helper alongside llama_main.

Add a native macOS dictation app that runs fully on-device using ExecuTorch: NVIDIA Parakeet-TDT for ASR (Metal backend) plus a fine-tuned LiquidAI LFM2.5-350M for cleaning up disfluencies, casing, and punctuation (MLX delegate). Layout follows the voxtral_realtime/macos/ convention: execuwhisper/ macos/ ExecuWhisper/ Swift app source ExecuWhisperTests/ XCTest target docs/ Demo script, support runbook, release QA checklist scripts/ Build / DMG / sign / verify / probe scripts project.yml xcodegen spec (no DEVELOPMENT_TEAM hard-coded; supply via env var) README.md Public README with prebuilt + from-source paths THIRD_PARTY_NOTICES Upstream component attribution CHANGELOG.md v0.1.0 initial open-source release notes .gitignore xcodeproj/, build/, DMG, etc. Models live in two Hugging Face repos: younghan-meta/Parakeet-TDT-ExecuTorch-Metal (ASR runtime) younghan-meta/LFM2.5-350M-ExecuWhisper-Formatter (formatter runtime + fp32) Helper binaries depend on three upstream ExecuTorch PRs in review: pytorch/executorch#18861 - parakeet_helper (ASR runtime) pytorch/executorch#19195 - LFM2.5 MLX export pipeline pytorch/executorch#19562 - lfm25_formatter_helper (formatter runtime) Until those land, build via the README from-source path or use the prebuilt arm64 helpers attached to the GitHub Release on this PR. Eval: AMI release-gate run for the formatter shows forbidden 0.030 (gate 0.10) and coverage 0.874 (gate 0.85). Full eval reports in the formatter HF repo under eval/. No telemetry. The only network call is the first-launch model download from huggingface.co.

github-actions · 2026-06-16T01:37:35Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

Resolve conflict in examples/models/parakeet/main.cpp: keep this PR's ParakeetTranscriber-based main(). The inline TransducerRunner logic that landed on main is superseded by the ParakeetTranscriber class, which encapsulates load/preprocess/decode/timestamps and is shared by both parakeet_runner and the new parakeet_helper. types.h, CMakeLists.txt (extension_asr_runner) and the doc/export updates are taken from main; the Token type is now the shared asr::Token alias, which is layout-compatible with the transcriber's usage.

linux-foundation-easycla · 2026-06-18T16:51:01Z

The committers listed above are authorized under a signed CLA.

✅ login: seyeong-han / name: Young Han (2aea05e, 66a2db8, b54a81c)

seyeong-han · 2026-06-18T16:59:44Z

@pytorchbot merge

pytorch-bot · 2026-06-18T16:59:48Z

Mergebot is not configured for this repository. Please use the merge button provided by GitHub.

seyeong-han requested review from kirklandsign, larryliu0820 and lucylq as code owners April 14, 2026 00:41

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 14, 2026

seyeong-han mentioned this pull request Apr 14, 2026

Add ExecuWhisper macOS app for low-latency on-device Parakeet dictation meta-pytorch/executorch-examples#232

Closed

2 tasks

seyeong-han requested a review from mergennachin April 14, 2026 18:10

seyeong-han mentioned this pull request May 13, 2026

Add a persistent LFM2.5 formatter helper for macOS integrations #19562

Draft

seyeong-han mentioned this pull request May 15, 2026

Add ExecuWhisper macOS dictation example (Parakeet + LFM2.5 formatter) meta-pytorch/executorch-examples#237

Open

github-actions Bot added the Stale PRs inactive for over 60 days label Jun 16, 2026

mergennachin approved these changes Jun 18, 2026

View reviewed changes

seyeong-han temporarily deployed to cadence June 18, 2026 16:51 — with GitHub Actions Inactive

seyeong-han merged commit 4557df5 into main Jun 18, 2026
215 of 223 checks passed

seyeong-han deleted the parakeet-helper-macos-alignment branch June 18, 2026 18:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a persistent Parakeet helper for low-latency host integrations#18861

Add a persistent Parakeet helper for low-latency host integrations#18861
seyeong-han merged 3 commits into
mainfrom
parakeet-helper-macos-alignment

seyeong-han commented Apr 14, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Apr 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 14, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

linux-foundation-easycla Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

seyeong-han commented Jun 18, 2026

Uh oh!

pytorch-bot Bot commented Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

seyeong-han commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why a helper?

Test plan

Uh oh!

pytorch-bot Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18861

❌ 1 New Failure, 12 Pending, 4 Unrelated Failures, 1 Unclassified Failure

Uh oh!

github-actions Bot commented Apr 14, 2026

This PR needs a release notes: label

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

linux-foundation-easycla Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seyeong-han commented Jun 18, 2026

Uh oh!

pytorch-bot Bot commented Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

seyeong-han commented Apr 14, 2026 •

edited

Loading

pytorch-bot Bot commented Apr 14, 2026 •

edited

Loading

This PR needs a `release notes:` label

linux-foundation-easycla Bot commented Jun 18, 2026 •

edited

Loading