Skip to content

WhisperKit: harden TextDecoder against nil logits and prefill drift#483

Open
yemreak wants to merge 2 commits into
argmaxinc:mainfrom
yemreak:fix/whisperkit-textdecoder-prefill-phase
Open

WhisperKit: harden TextDecoder against nil logits and prefill drift#483
yemreak wants to merge 2 commits into
argmaxinc:mainfrom
yemreak:fix/whisperkit-textdecoder-prefill-phase

Conversation

@yemreak
Copy link
Copy Markdown

@yemreak yemreak commented May 18, 2026

Summary

Two related fixes in TextDecoder:

  1. Replace force-unwraps that crash on degraded CoreML output (three decoderOutput.logits! and one decoderInputs.initialPrompt.last!) with guard let that throws WhisperError.decodingFailed. The crashes are easy to hit (model swap mid-flight, OOM, malformed model) and impossible to recover from at the call site.

  2. Fix early-termination during prefill when a non-empty promptTokens is supplied. The previous logic considered the first decoded token to be at prefilledIndex, but with a prompt it really starts at max(prefilledIndex, initialPromptIndex). Without this, the decoder can early-stop mid-prompt and produce empty or truncated transcripts for any caller that uses promptTokens to steer Whisper (e.g., feeding a glossary or a prior-context window).

Changes

  • guard let + WhisperError.decodingFailed(...) for the four force-unwraps.
  • Add an isInPrefillPhase flag and:
    • gate isFirstTokenLogProbTooLow so it only fires after the prompt has been consumed and only when no promptTokens were supplied;
    • skip the sampleResult.completed segment-completion check during prefill (the model is being force-fed prompt tokens and may legitimately predict EOT mid-prompt).

No new public API; behaviour with empty promptTokens is unchanged.

Testing

  • Reproduced the mid-prompt early-stop on large-v3 with a non-trivial promptTokens and confirmed it disappears with the patch.
  • Force-fed a malformed model to verify the new throws path surfaces a clean error instead of EXC_BAD_INSTRUCTION.
  • Existing WhisperKit test suite passes locally.

a2they and others added 2 commits May 1, 2026 16:11
Two related fixes around the TextDecoder main loop:

1. Replace three `decoderOutput.logits!` and one
   `decoderInputs.initialPrompt.last!` force-unwrap with `guard
   let` that throws `WhisperError.decodingFailed`. The crashes are
   easy to hit when CoreML returns a degraded output (out-of-memory,
   model swap mid-flight, etc.) and they're impossible to recover
   from at the call site.

2. Fix early-termination during prefill of a non-empty
   `promptTokens`. The previous logic considered the first decoded
   token to be at `prefilledIndex`, but when a prompt is fed it
   really starts at `max(prefilledIndex, initialPromptIndex)`.
   Add an explicit `isInPrefillPhase` flag and:
   - gate `isFirstTokenLogProbTooLow` so it only fires after the
     prompt has been consumed and only when no promptTokens were
     supplied;
   - skip the `sampleResult.completed` segment-completion check
     during prefill, since the model is being force-fed prompt
     tokens and may legitimately predict EOT mid-prompt.

Without (2) the decoder can early-stop mid-prompt, producing empty
or truncated transcripts for any caller that uses `promptTokens`
to steer Whisper (e.g., feeding a glossary).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants