Skip to content

Add a persistent Parakeet helper for low-latency host integrations#18861

Merged
seyeong-han merged 3 commits into
mainfrom
parakeet-helper-macos-alignment
Jun 18, 2026
Merged

Add a persistent Parakeet helper for low-latency host integrations#18861
seyeong-han merged 3 commits into
mainfrom
parakeet-helper-macos-alignment

Conversation

@seyeong-han

@seyeong-han seyeong-han commented Apr 14, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Factor the Parakeet transcription core out of parakeet_runner into a shared ParakeetTranscriber class
  • Add a new parakeet_helper binary plus a stdin/stdout helper protocol for long-lived host integrations
  • Build the helper in the existing Parakeet CMake presets and document the helper workflow in the README

Why a helper?

The Voxtral Realtime macOS app (executorch-examples/voxtral_realtime/macos) didn't need any changes to the executorch repo because voxtral_realtime_runner was already designed as a streaming, long-running process — the app just launches it and feeds audio.

parakeet_runner is different: it's a one-shot batch CLI tool that loads the model, transcribes one WAV file, prints the result, and exits. There's no way to send it a second request without restarting the process and paying the ~1.4 s model-load cost again.

The ExecuWhisper macOS app (meta-pytorch/executorch-examples#232) runs repeated record-then-transcribe requests via system dictation, so a fresh process per recording is too slow. parakeet_helper fills that gap — it's the Parakeet equivalent of what the Voxtral Realtime runner already does natively: stay alive, keep the model warm, and accept multiple requests over stdin/stdout.

Test plan

  • cmake --preset llm-metal-stats -DEXECUTORCH_BUILD_MLX=OFF
  • cmake --build --preset llm-metal-stats-install
  • cd examples/models/parakeet && cmake --build --preset parakeet-metal -- both parakeet_runner and parakeet_helper link successfully

Made-with: Cursor

Factor the Parakeet transcription logic out of the one-shot runner so host apps can keep the model warm across requests. Build the new helper alongside the runner and document the helper workflow for app integrations.

Made-with: Cursor
@pytorch-bot

pytorch-bot Bot commented Apr 14, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18861

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 12 Pending, 4 Unrelated Failures, 1 Unclassified Failure

As of commit 2aea05e with merge base 66884b4 (image):

NEW FAILURE - The following job has failed:

UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 14, 2026
@github-actions

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

stdout and stderr are C standard library macros. On macOS/Clang they
resolve to extern FILE* variables so using them as parameter names
compiles by coincidence. On Windows/MSVC they expand to function calls
like (__acrt_iob_func(1)), breaking the syntax.

Rename to captured_stdout / captured_stderr. The JSON keys remain
'stdout' and 'stderr' (unchanged wire format).
seyeong-han added a commit to seyeong-han/executorch that referenced this pull request May 13, 2026
Long-lived companion process for the LFM2.5-350M MLX formatter, mirroring
the parakeet helper introduced in pytorch#18861. Wraps an
executorch::extension::llm::TextLLMRunner with the same JSON-line
stdin/stdout protocol the macOS ExecuWhisper app already uses for the
parakeet ASR helper, so the formatter model can stay loaded and KV-warm
across requests.

Wire contract (kProtocolVersion=1):
  Requests:
    {"type":"format",   "version":1, "request_id":..., "prompt":...,
     "max_new_tokens":..., "temperature":...}
    {"type":"shutdown", "version":1}
  Responses:
    {"type":"ready",   "version":1}
    {"type":"status",  "version":1, "request_id":..., "phase":..., "message":...}
    {"type":"result",  "version":1, "request_id":..., "text":..., "stdout":..., "stderr":...,
     "tokens_per_second":<opt double>}
    {"type":"error",   "version":1, "request_id":<opt>, "message":..., "details":<opt>}

The Swift counterpart lives at
  ExecuWhisper/Services/FormatterHelperProtocol.swift
in meta-llama/internal-llama-cookbook (end-to-end-use-cases/ExecuWhisper).

Build via the existing make target:
  cd ~/executorch
  make lfm_2_5_formatter-mlx
which produces:
  cmake-out/examples/models/llama/lfm25_formatter_helper
  cmake-out/examples/models/llama/mlx.metallib

The new lfm_2_5_formatter-mlx Make target depends on the existing
lfm_2_5-mlx target; the llama-mlx CMake build preset's targets list
now includes lfm25_formatter_helper alongside llama_main.
seyeong-han added a commit to seyeong-han/executorch that referenced this pull request May 13, 2026
Long-lived companion process for the LFM2.5-350M MLX formatter, mirroring
the parakeet helper introduced in pytorch#18861. Wraps an
executorch::extension::llm::TextLLMRunner with the same JSON-line
stdin/stdout protocol the macOS ExecuWhisper app already uses for the
parakeet ASR helper, so the formatter model can stay loaded and KV-warm
across requests.

Wire contract (kProtocolVersion=1):
  Requests:
    {"type":"format",   "version":1, "request_id":..., "prompt":...,
     "max_new_tokens":..., "temperature":...}
    {"type":"shutdown", "version":1}
  Responses:
    {"type":"ready",   "version":1}
    {"type":"status",  "version":1, "request_id":..., "phase":..., "message":...}
    {"type":"result",  "version":1, "request_id":..., "text":..., "stdout":..., "stderr":...,
     "tokens_per_second":<opt double>}
    {"type":"error",   "version":1, "request_id":<opt>, "message":..., "details":<opt>}

The Swift counterpart lives at
  ExecuWhisper/Services/FormatterHelperProtocol.swift
in meta-llama/internal-llama-cookbook (end-to-end-use-cases/ExecuWhisper).

Build via the existing make target:
  cd ~/executorch
  make lfm_2_5_formatter-mlx
which produces:
  cmake-out/examples/models/llama/lfm25_formatter_helper
  cmake-out/examples/models/llama/mlx.metallib

The new lfm_2_5_formatter-mlx Make target depends on the existing
lfm_2_5-mlx target; the llama-mlx CMake build preset's targets list
now includes lfm25_formatter_helper alongside llama_main.
seyeong-han added a commit to seyeong-han/executorch that referenced this pull request May 13, 2026
Long-lived companion process for the LFM2.5-350M MLX formatter, mirroring
the parakeet helper introduced in pytorch#18861. Wraps an
executorch::extension::llm::TextLLMRunner with the same JSON-line
stdin/stdout protocol the macOS ExecuWhisper app already uses for the
parakeet ASR helper, so the formatter model can stay loaded and KV-warm
across requests.

Wire contract (kProtocolVersion=1):
  Requests:
    {"type":"format",   "version":1, "request_id":..., "prompt":...,
     "max_new_tokens":..., "temperature":...}
    {"type":"shutdown", "version":1}
  Responses:
    {"type":"ready",   "version":1}
    {"type":"status",  "version":1, "request_id":..., "phase":..., "message":...}
    {"type":"result",  "version":1, "request_id":..., "text":..., "stdout":..., "stderr":...,
     "tokens_per_second":<opt double>}
    {"type":"error",   "version":1, "request_id":<opt>, "message":..., "details":<opt>}

The Swift counterpart lives at
  ExecuWhisper/Services/FormatterHelperProtocol.swift
in meta-llama/internal-llama-cookbook (end-to-end-use-cases/ExecuWhisper).

Build via the existing make target:
  cd ~/executorch
  make lfm_2_5_formatter-mlx
which produces:
  cmake-out/examples/models/llama/lfm25_formatter_helper
  cmake-out/examples/models/llama/mlx.metallib

The new lfm_2_5_formatter-mlx Make target depends on the existing
lfm_2_5-mlx target; the llama-mlx CMake build preset's targets list
now includes lfm25_formatter_helper alongside llama_main.
seyeong-han added a commit to seyeong-han/executorch-examples that referenced this pull request May 15, 2026
Add a native macOS dictation app that runs fully on-device using ExecuTorch:
NVIDIA Parakeet-TDT for ASR (Metal backend) plus a fine-tuned LiquidAI
LFM2.5-350M for cleaning up disfluencies, casing, and punctuation (MLX
delegate).

Layout follows the voxtral_realtime/macos/ convention:
  execuwhisper/
    macos/
      ExecuWhisper/        Swift app source
      ExecuWhisperTests/   XCTest target
      docs/                Demo script, support runbook, release QA checklist
      scripts/             Build / DMG / sign / verify / probe scripts
      project.yml          xcodegen spec (no DEVELOPMENT_TEAM hard-coded;
                           supply via env var)
      README.md            Public README with prebuilt + from-source paths
      THIRD_PARTY_NOTICES  Upstream component attribution
      CHANGELOG.md         v0.1.0 initial open-source release notes
      .gitignore           xcodeproj/, build/, DMG, etc.

Models live in two Hugging Face repos:
  younghan-meta/Parakeet-TDT-ExecuTorch-Metal      (ASR runtime)
  younghan-meta/LFM2.5-350M-ExecuWhisper-Formatter (formatter runtime + fp32)

Helper binaries depend on three upstream ExecuTorch PRs in review:
  pytorch/executorch#18861 - parakeet_helper (ASR runtime)
  pytorch/executorch#19195 - LFM2.5 MLX export pipeline
  pytorch/executorch#19562 - lfm25_formatter_helper (formatter runtime)

Until those land, build via the README from-source path or use the
prebuilt arm64 helpers attached to the GitHub Release on this PR.

Eval: AMI release-gate run for the formatter shows forbidden 0.030 (gate
0.10) and coverage 0.874 (gate 0.85). Full eval reports in the formatter
HF repo under eval/.

No telemetry. The only network call is the first-launch model download
from huggingface.co.
seyeong-han added a commit to seyeong-han/executorch-examples that referenced this pull request May 15, 2026
Add a native macOS dictation app that runs fully on-device using ExecuTorch:
NVIDIA Parakeet-TDT for ASR (Metal backend) plus a fine-tuned LiquidAI
LFM2.5-350M for cleaning up disfluencies, casing, and punctuation (MLX
delegate).

Layout follows the voxtral_realtime/macos/ convention:
  execuwhisper/
    macos/
      ExecuWhisper/        Swift app source
      ExecuWhisperTests/   XCTest target
      docs/                Demo script, support runbook, release QA checklist
      scripts/             Build / DMG / sign / verify / probe scripts
      project.yml          xcodegen spec (no DEVELOPMENT_TEAM hard-coded;
                           supply via env var)
      README.md            Public README with prebuilt + from-source paths
      THIRD_PARTY_NOTICES  Upstream component attribution
      CHANGELOG.md         v0.1.0 initial open-source release notes
      .gitignore           xcodeproj/, build/, DMG, etc.

Models live in two Hugging Face repos:
  younghan-meta/Parakeet-TDT-ExecuTorch-Metal      (ASR runtime)
  younghan-meta/LFM2.5-350M-ExecuWhisper-Formatter (formatter runtime + fp32)

Helper binaries depend on three upstream ExecuTorch PRs in review:
  pytorch/executorch#18861 - parakeet_helper (ASR runtime)
  pytorch/executorch#19195 - LFM2.5 MLX export pipeline
  pytorch/executorch#19562 - lfm25_formatter_helper (formatter runtime)

Until those land, build via the README from-source path or use the
prebuilt arm64 helpers attached to the GitHub Release on this PR.

Eval: AMI release-gate run for the formatter shows forbidden 0.030 (gate
0.10) and coverage 0.874 (gate 0.85). Full eval reports in the formatter
HF repo under eval/.

No telemetry. The only network call is the first-launch model download
from huggingface.co.
@github-actions

Copy link
Copy Markdown

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions Bot added the Stale PRs inactive for over 60 days label Jun 16, 2026
Resolve conflict in examples/models/parakeet/main.cpp: keep this PR's
ParakeetTranscriber-based main(). The inline TransducerRunner logic that
landed on main is superseded by the ParakeetTranscriber class, which
encapsulates load/preprocess/decode/timestamps and is shared by both
parakeet_runner and the new parakeet_helper.

types.h, CMakeLists.txt (extension_asr_runner) and the doc/export updates
are taken from main; the Token type is now the shared asr::Token alias,
which is layout-compatible with the transcriber's usage.
@linux-foundation-easycla

linux-foundation-easycla Bot commented Jun 18, 2026

Copy link
Copy Markdown

CLA Signed
The committers listed above are authorized under a signed CLA.

@seyeong-han

Copy link
Copy Markdown
Contributor Author

@pytorchbot merge

@pytorch-bot

pytorch-bot Bot commented Jun 18, 2026

Copy link
Copy Markdown

Mergebot is not configured for this repository. Please use the merge button provided by GitHub.

@seyeong-han seyeong-han merged commit 4557df5 into main Jun 18, 2026
215 of 223 checks passed
@seyeong-han seyeong-han deleted the parakeet-helper-macos-alignment branch June 18, 2026 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Stale PRs inactive for over 60 days

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants