Skip to content

Skip HuggingFace round-trip in WhisperKit.download when files are local#468

Open
PaTiToMaSteR wants to merge 2 commits into
argmaxinc:mainfrom
PaTiToMaSteR:skip-download-network-roundtrip-on-cache-hit
Open

Skip HuggingFace round-trip in WhisperKit.download when files are local#468
PaTiToMaSteR wants to merge 2 commits into
argmaxinc:mainfrom
PaTiToMaSteR:skip-download-network-roundtrip-on-cache-hit

Conversation

@PaTiToMaSteR
Copy link
Copy Markdown

Skip HuggingFace round-trip in WhisperKit.download when the model is already on disk

Problem

WhisperKit.download(variant:...) makes an unconditional network round-trip
to HuggingFace's getFilenames API, even when every required .mlmodelc
bundle is already in the local snapshot cache. On a cellular fallback or
busy Wi-Fi this round-trip dominates the cold-start cost users see when
opening a screen that uses STT — measured at ~5 s on iPhone 14 Pro Max
running iOS 18
(whisperkit-coreml base, fully cached locally).

The downstream cost is amplified for apps that load WhisperKit on demand
(e.g. only when the user opens a screen that needs transcription) rather
than at app launch — that "downloading" message users see is mostly the
client telling HuggingFace "are you sure you want to download?" while
the model is already sitting on disk waiting to be used.

Fix

Add a locallyCachedFolder(variant:repoID:downloadBase:) static helper
that mirrors HubApi's default snapshot path
(<downloadBase>/models/<repo>/<variant>) and checks whether the three
CoreML bundles WhisperKit always loads
(AudioEncoder.mlmodelc, MelSpectrogram.mlmodelc, TextDecoder.mlmodelc)
are already present.

When all three are present at the conventional openai_whisper-<variant>
path (or the bare <variant> path, kept for backward compatibility),
download returns that URL early and emits a single 100 %-complete
progress callback so callers don't see a UI stall. On cache miss the
existing slow path runs unchanged.

The fix uses public-only Swift API and replicates HubApi's path
convention rather than reaching into Hub internals.

Measurements

iPhone 14 Pro Max (A16, 6 GB RAM) · iOS 18 · WhisperKit base model · same
locally cached .mlmodelc files for both runs. App is a Capacitor + WKWebView
shell that calls WhisperKit.init(WhisperKitConfig(modelFolder: ...)) after
WhisperKit.download(variant: "base") resolves. Reported number is the wall
clock from harness "load STT" → STT-ready event:

Stage Before this PR After this PR Δ
WhisperKit.download (cached files on disk) 5 109 ms 1 ms −5 108 ms
WhisperKit(WhisperKitConfig(...)) init 8 538 ms 8 583 ms +45 ms (noise; CoreML compile dominates)
Total cold STT load 13 648 ms 8 584 ms −5 064 ms (37 % faster)

The remaining 8.5 s is Apple's CoreML / Metal model specialization step,
which serializes internally and isn't addressable from this layer — but
removing the avoidable network round-trip is a clean ~37 % win that every
on-demand WhisperKit user benefits from.

Compatibility

  • Pure additive change: download signature, return type, and slow-path
    behaviour are unchanged. Callers that don't have a local cache see the
    exact same behaviour as today.
  • Variant folder lookup tries both openai_whisper-<variant> (the
    standard whisperkit-coreml layout) and bare <variant> (for users
    pointing repo at a custom HuggingFace repo with a different naming
    convention).
  • Required-files check is conservative: if any of the three bundles is
    missing or partial, falls through to getFilenames + snapshot. So
    a corrupted half-download still gets repaired the way it does today.
  • Patch is no-op on download(downloadBase:) callers that pass a
    non-default base — same path lookup logic.

Test plan

  • Manually verified on iPhone 14 Pro Max: cached load drops from
    5 109 ms → 1 ms in the download step; full STT load 13.6 s → 8.6 s
  • Cache-miss path unchanged: deleted local model folder, confirmed
    slow path still downloads + completes successfully
  • Mismatched repo (custom repo: arg pointing at a non-openai_whisper
    layout): falls through to slow path correctly via the second candidate
  • CI test on the WhisperKit suite (would appreciate a maintainer adding
    a fixture-based test if you'd like one — happy to write it as a
    follow-up commit)

Notes

The Vibrez team came across this while diagnosing a 13.6 s cold-start
wait when users open the in-app /brain console. The full investigation
(including how we ruled out Metal PSO compilation as the bottleneck via
a multi-launch measurement matrix) is captured in our planning notes if
that's useful context for the review.

a2they and others added 2 commits March 31, 2026 23:21
Adds locallyCachedFolder() helper to detect when the variant folder
already contains the three CoreML bundles WhisperKit needs, and
returns it directly instead of round-tripping through hubApi.getFilenames
+ hubApi.snapshot. Saves ~5 s on cellular / busy Wi-Fi for every cached
load.

Cache-miss path is unchanged — falls through to the existing download
flow. See PR description for measurements.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants