Skip HuggingFace round-trip in WhisperKit.download when files are local#468
Open
PaTiToMaSteR wants to merge 2 commits into
Open
Skip HuggingFace round-trip in WhisperKit.download when files are local#468PaTiToMaSteR wants to merge 2 commits into
PaTiToMaSteR wants to merge 2 commits into
Conversation
Adds locallyCachedFolder() helper to detect when the variant folder already contains the three CoreML bundles WhisperKit needs, and returns it directly instead of round-tripping through hubApi.getFilenames + hubApi.snapshot. Saves ~5 s on cellular / busy Wi-Fi for every cached load. Cache-miss path is unchanged — falls through to the existing download flow. See PR description for measurements.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Skip HuggingFace round-trip in
WhisperKit.downloadwhen the model is already on diskProblem
WhisperKit.download(variant:...)makes an unconditional network round-tripto HuggingFace's
getFilenamesAPI, even when every required.mlmodelcbundle is already in the local snapshot cache. On a cellular fallback or
busy Wi-Fi this round-trip dominates the cold-start cost users see when
opening a screen that uses STT — measured at ~5 s on iPhone 14 Pro Max
running iOS 18 (whisperkit-coreml
base, fully cached locally).The downstream cost is amplified for apps that load WhisperKit on demand
(e.g. only when the user opens a screen that needs transcription) rather
than at app launch — that "downloading" message users see is mostly the
client telling HuggingFace "are you sure you want to download?" while
the model is already sitting on disk waiting to be used.
Fix
Add a
locallyCachedFolder(variant:repoID:downloadBase:)static helperthat mirrors
HubApi's default snapshot path(
<downloadBase>/models/<repo>/<variant>) and checks whether the threeCoreML bundles WhisperKit always loads
(
AudioEncoder.mlmodelc,MelSpectrogram.mlmodelc,TextDecoder.mlmodelc)are already present.
When all three are present at the conventional
openai_whisper-<variant>path (or the bare
<variant>path, kept for backward compatibility),downloadreturns that URL early and emits a single 100 %-completeprogress callback so callers don't see a UI stall. On cache miss the
existing slow path runs unchanged.
The fix uses public-only Swift API and replicates
HubApi's pathconvention rather than reaching into Hub internals.
Measurements
iPhone 14 Pro Max (A16, 6 GB RAM) · iOS 18 · WhisperKit
basemodel · samelocally cached
.mlmodelcfiles for both runs. App is a Capacitor + WKWebViewshell that calls
WhisperKit.init(WhisperKitConfig(modelFolder: ...))afterWhisperKit.download(variant: "base")resolves. Reported number is the wallclock from harness "load STT" → STT-ready event:
WhisperKit.download(cached files on disk)WhisperKit(WhisperKitConfig(...))initThe remaining 8.5 s is Apple's CoreML / Metal model specialization step,
which serializes internally and isn't addressable from this layer — but
removing the avoidable network round-trip is a clean ~37 % win that every
on-demand WhisperKit user benefits from.
Compatibility
downloadsignature, return type, and slow-pathbehaviour are unchanged. Callers that don't have a local cache see the
exact same behaviour as today.
openai_whisper-<variant>(thestandard whisperkit-coreml layout) and bare
<variant>(for userspointing
repoat a custom HuggingFace repo with a different namingconvention).
missing or partial, falls through to
getFilenames+snapshot. Soa corrupted half-download still gets repaired the way it does today.
download(downloadBase:)callers that pass anon-default base — same path lookup logic.
Test plan
5 109 ms → 1 ms in the
downloadstep; full STT load 13.6 s → 8.6 sslow path still downloads + completes successfully
repo:arg pointing at a non-openai_whisperlayout): falls through to slow path correctly via the second candidate
a fixture-based test if you'd like one — happy to write it as a
follow-up commit)
Notes
The Vibrez team came across this while diagnosing a 13.6 s cold-start
wait when users open the in-app
/brainconsole. The full investigation(including how we ruled out Metal PSO compilation as the bottleneck via
a multi-launch measurement matrix) is captured in our planning notes if
that's useful context for the review.