WhisperKit: harden TextDecoder against nil logits and prefill drift#483
Open
yemreak wants to merge 2 commits into
Open
WhisperKit: harden TextDecoder against nil logits and prefill drift#483yemreak wants to merge 2 commits into
yemreak wants to merge 2 commits into
Conversation
Two related fixes around the TextDecoder main loop:
1. Replace three `decoderOutput.logits!` and one
`decoderInputs.initialPrompt.last!` force-unwrap with `guard
let` that throws `WhisperError.decodingFailed`. The crashes are
easy to hit when CoreML returns a degraded output (out-of-memory,
model swap mid-flight, etc.) and they're impossible to recover
from at the call site.
2. Fix early-termination during prefill of a non-empty
`promptTokens`. The previous logic considered the first decoded
token to be at `prefilledIndex`, but when a prompt is fed it
really starts at `max(prefilledIndex, initialPromptIndex)`.
Add an explicit `isInPrefillPhase` flag and:
- gate `isFirstTokenLogProbTooLow` so it only fires after the
prompt has been consumed and only when no promptTokens were
supplied;
- skip the `sampleResult.completed` segment-completion check
during prefill, since the model is being force-fed prompt
tokens and may legitimately predict EOT mid-prompt.
Without (2) the decoder can early-stop mid-prompt, producing empty
or truncated transcripts for any caller that uses `promptTokens`
to steer Whisper (e.g., feeding a glossary).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related fixes in
TextDecoder:Replace force-unwraps that crash on degraded CoreML output (three
decoderOutput.logits!and onedecoderInputs.initialPrompt.last!) withguard letthat throwsWhisperError.decodingFailed. The crashes are easy to hit (model swap mid-flight, OOM, malformed model) and impossible to recover from at the call site.Fix early-termination during prefill when a non-empty
promptTokensis supplied. The previous logic considered the first decoded token to be atprefilledIndex, but with a prompt it really starts atmax(prefilledIndex, initialPromptIndex). Without this, the decoder can early-stop mid-prompt and produce empty or truncated transcripts for any caller that usespromptTokensto steer Whisper (e.g., feeding a glossary or a prior-context window).Changes
guard let+WhisperError.decodingFailed(...)for the four force-unwraps.isInPrefillPhaseflag and:isFirstTokenLogProbTooLowso it only fires after the prompt has been consumed and only when nopromptTokenswere supplied;sampleResult.completedsegment-completion check during prefill (the model is being force-fed prompt tokens and may legitimately predict EOT mid-prompt).No new public API; behaviour with empty
promptTokensis unchanged.Testing
large-v3with a non-trivialpromptTokensand confirmed it disappears with the patch.EXC_BAD_INSTRUCTION.