Skip HuggingFace round-trip in WhisperKit.download when files are local by PaTiToMaSteR · Pull Request #468 · argmaxinc/argmax-oss-swift

PaTiToMaSteR · 2026-05-01T16:44:09Z

Skip HuggingFace round-trip in `WhisperKit.download` when the model is already on disk

Problem

WhisperKit.download(variant:...) makes an unconditional network round-trip
to HuggingFace's getFilenames API, even when every required .mlmodelc
bundle is already in the local snapshot cache. On a cellular fallback or
busy Wi-Fi this round-trip dominates the cold-start cost users see when
opening a screen that uses STT — measured at ~5 s on iPhone 14 Pro Max
running iOS 18 (whisperkit-coreml base, fully cached locally).

The downstream cost is amplified for apps that load WhisperKit on demand
(e.g. only when the user opens a screen that needs transcription) rather
than at app launch — that "downloading" message users see is mostly the
client telling HuggingFace "are you sure you want to download?" while
the model is already sitting on disk waiting to be used.

Fix

Add a locallyCachedFolder(variant:repoID:downloadBase:) static helper
that mirrors HubApi's default snapshot path
(<downloadBase>/models/<repo>/<variant>) and checks whether the three
CoreML bundles WhisperKit always loads
(AudioEncoder.mlmodelc, MelSpectrogram.mlmodelc, TextDecoder.mlmodelc)
are already present.

When all three are present at the conventional openai_whisper-<variant>
path (or the bare <variant> path, kept for backward compatibility),
download returns that URL early and emits a single 100 %-complete
progress callback so callers don't see a UI stall. On cache miss the
existing slow path runs unchanged.

The fix uses public-only Swift API and replicates HubApi's path
convention rather than reaching into Hub internals.

Measurements

iPhone 14 Pro Max (A16, 6 GB RAM) · iOS 18 · WhisperKit base model · same
locally cached .mlmodelc files for both runs. App is a Capacitor + WKWebView
shell that calls WhisperKit.init(WhisperKitConfig(modelFolder: ...)) after
WhisperKit.download(variant: "base") resolves. Reported number is the wall
clock from harness "load STT" → STT-ready event:

Stage	Before this PR	After this PR	Δ
`WhisperKit.download` (cached files on disk)	5 109 ms	1 ms	−5 108 ms
`WhisperKit(WhisperKitConfig(...))` init	8 538 ms	8 583 ms	+45 ms (noise; CoreML compile dominates)
Total cold STT load	13 648 ms	8 584 ms	−5 064 ms (37 % faster)

The remaining 8.5 s is Apple's CoreML / Metal model specialization step,
which serializes internally and isn't addressable from this layer — but
removing the avoidable network round-trip is a clean ~37 % win that every
on-demand WhisperKit user benefits from.

Compatibility

Pure additive change: download signature, return type, and slow-path
behaviour are unchanged. Callers that don't have a local cache see the
exact same behaviour as today.
Variant folder lookup tries both openai_whisper-<variant> (the
standard whisperkit-coreml layout) and bare <variant> (for users
pointing repo at a custom HuggingFace repo with a different naming
convention).
Required-files check is conservative: if any of the three bundles is
missing or partial, falls through to getFilenames + snapshot. So
a corrupted half-download still gets repaired the way it does today.
Patch is no-op on download(downloadBase:) callers that pass a
non-default base — same path lookup logic.

Test plan

Manually verified on iPhone 14 Pro Max: cached load drops from
5 109 ms → 1 ms in the download step; full STT load 13.6 s → 8.6 s
Cache-miss path unchanged: deleted local model folder, confirmed
slow path still downloads + completes successfully
Mismatched repo (custom repo: arg pointing at a non-openai_whisper
layout): falls through to slow path correctly via the second candidate
CI test on the WhisperKit suite (would appreciate a maintainer adding
a fixture-based test if you'd like one — happy to write it as a
follow-up commit)

Notes

The Vibrez team came across this while diagnosing a 13.6 s cold-start
wait when users open the in-app /brain console. The full investigation
(including how we ruled out Metal PSO compilation as the bottleneck via
a multi-launch measurement matrix) is captured in our planning notes if
that's useful context for the review.

Adds locallyCachedFolder() helper to detect when the variant folder already contains the three CoreML bundles WhisperKit needs, and returns it directly instead of round-tripping through hubApi.getFilenames + hubApi.snapshot. Saves ~5 s on cellular / busy Wi-Fi for every cached load. Cache-miss path is unchanged — falls through to the existing download flow. See PR description for measurements.

a2they and others added 2 commits March 31, 2026 23:21

Release v0.18.0

e2adabb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip HuggingFace round-trip in WhisperKit.download when files are local#468

Skip HuggingFace round-trip in WhisperKit.download when files are local#468
PaTiToMaSteR wants to merge 2 commits into
argmaxinc:mainfrom
PaTiToMaSteR:skip-download-network-roundtrip-on-cache-hit

PaTiToMaSteR commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

PaTiToMaSteR commented May 1, 2026

Skip HuggingFace round-trip in WhisperKit.download when the model is already on disk

Problem

Fix

Measurements

Compatibility

Test plan

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Skip HuggingFace round-trip in `WhisperKit.download` when the model is already on disk