WhisperKit: harden TextDecoder against nil logits and prefill drift by yemreak · Pull Request #483 · argmaxinc/argmax-oss-swift

yemreak · 2026-05-18T23:58:50Z

Summary

Two related fixes in TextDecoder:

Replace force-unwraps that crash on degraded CoreML output (three decoderOutput.logits! and one decoderInputs.initialPrompt.last!) with guard let that throws WhisperError.decodingFailed. The crashes are easy to hit (model swap mid-flight, OOM, malformed model) and impossible to recover from at the call site.
Fix early-termination during prefill when a non-empty promptTokens is supplied. The previous logic considered the first decoded token to be at prefilledIndex, but with a prompt it really starts at max(prefilledIndex, initialPromptIndex). Without this, the decoder can early-stop mid-prompt and produce empty or truncated transcripts for any caller that uses promptTokens to steer Whisper (e.g., feeding a glossary or a prior-context window).

Changes

guard let + WhisperError.decodingFailed(...) for the four force-unwraps.
Add an isInPrefillPhase flag and:
- gate isFirstTokenLogProbTooLow so it only fires after the prompt has been consumed and only when no promptTokens were supplied;
- skip the sampleResult.completed segment-completion check during prefill (the model is being force-fed prompt tokens and may legitimately predict EOT mid-prompt).

No new public API; behaviour with empty promptTokens is unchanged.

Testing

Reproduced the mid-prompt early-stop on large-v3 with a non-trivial promptTokens and confirmed it disappears with the patch.
Force-fed a malformed model to verify the new throws path surfaces a clean error instead of EXC_BAD_INSTRUCTION.
Existing WhisperKit test suite passes locally.

Two related fixes around the TextDecoder main loop: 1. Replace three `decoderOutput.logits!` and one `decoderInputs.initialPrompt.last!` force-unwrap with `guard let` that throws `WhisperError.decodingFailed`. The crashes are easy to hit when CoreML returns a degraded output (out-of-memory, model swap mid-flight, etc.) and they're impossible to recover from at the call site. 2. Fix early-termination during prefill of a non-empty `promptTokens`. The previous logic considered the first decoded token to be at `prefilledIndex`, but when a prompt is fed it really starts at `max(prefilledIndex, initialPromptIndex)`. Add an explicit `isInPrefillPhase` flag and: - gate `isFirstTokenLogProbTooLow` so it only fires after the prompt has been consumed and only when no promptTokens were supplied; - skip the `sampleResult.completed` segment-completion check during prefill, since the model is being force-fed prompt tokens and may legitimately predict EOT mid-prompt. Without (2) the decoder can early-stop mid-prompt, producing empty or truncated transcripts for any caller that uses `promptTokens` to steer Whisper (e.g., feeding a glossary).

a2they and others added 2 commits May 1, 2026 16:11

Release v1.0.0

25c6299

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WhisperKit: harden TextDecoder against nil logits and prefill drift#483

WhisperKit: harden TextDecoder against nil logits and prefill drift#483
yemreak wants to merge 2 commits into
argmaxinc:mainfrom
yemreak:fix/whisperkit-textdecoder-prefill-phase

yemreak commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yemreak commented May 18, 2026

Summary

Changes

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants