Skip to content

feat: anti-repetition protection for whisper transcription#552

Open
egsok wants to merge 1 commit intoOpenWhispr:mainfrom
egsok:feat/anti-repetition-clean
Open

feat: anti-repetition protection for whisper transcription#552
egsok wants to merge 1 commit intoOpenWhispr:mainfrom
egsok:feat/anti-repetition-clean

Conversation

@egsok
Copy link
Copy Markdown
Contributor

@egsok egsok commented Apr 5, 2026

Problem

Whisper (especially large-v3) is prone to hallucination loops — repeating the same phrase indefinitely, particularly during silence or at audio segment boundaries.

The root cause: Whisper uses the previous segment's output as context for the next. A repetition in segment N propagates to N+1, N+2, etc. — creating an exponential feedback loop. This is one of the most widely reported Whisper issues:

  • openai/whisper#679 — "A possible solution to Whisper hallucination" (37 comments, 147 replies) — foundational discussion on the root cause
  • whisper.cpp#1490 — "Large Model hallucination and repeating", maintainer recommends -mc + large-v2
  • whisper.cpp#1507 — maintainer recommends --max-context 64 or 32, --beam-size 5, --entropy-thold ~2.8
  • whisper.cpp#1244 — confirms -mc 0 = condition_on_previous_text=False

Solution

Three layers of protection:

1. --max-context 128 — limits cross-segment context to 128 tokens, breaking the feedback loop. Does NOT affect initial_prompt (custom prompt and dictionary words are always sent in full). Maintainer recommends 64 (#1507); we use a more conservative 128 to preserve cross-segment continuity, relying on the regex dedup layer to catch any loops that slip through.

2. --entropy-thold 2.8 — discards segments where model confidence is very low (high entropy = silence/noise/hallucination). Default is 2.4; raised to 2.8 per maintainer recommendation in #1507. Zero performance overhead.

3. Regex dedup — post-processing safety net that removes consecutive repeated phrases (10+ chars) from final output. 10-char minimum avoids false positives on common short phrases.

Parameter Value Default Rationale
--max-context 128 224 Conservative vs maintainer's 64 — regex dedup catches the rest
--entropy-thold 2.8 2.4 Per maintainer recommendation (#1507)
Regex min length 10 chars Avoids false positives on "I think", "you know", etc.

How Whisper context works

initial_prompt (custom prompt + dictionary)
  → always used in full (up to 224 tokens)
  → NOT affected by --max-context

cross-segment context (previous segment output)
  → fed back to next segment for continuity
  → --max-context limits THIS to prevent loops

UI

New "Suppress repeated phrases" toggle in Settings → Transcription (default: on). When disabled, only regex dedup is skipped — whisper.cpp flags remain active as they improve quality regardless.

Files changed

File Change
src/helpers/whisperServer.js --max-context 128 and --entropy-thold 2.8 flags
src/helpers/audioManager.js removeRepetitions() regex dedup + wiring in processTranscription()
src/components/SettingsPage.tsx Toggle UI in Transcription section
src/hooks/useSettings.ts + src/stores/settingsStore.ts suppressRepetition boolean setting
src/locales/*/translation.json i18n for all 10 locales

Test plan

  • Record 30+ seconds with pauses — verify no hallucination loops
  • Record normal speech — verify transcription quality unchanged
  • Verify custom prompt and dictionary still work with --max-context 128
  • Toggle off → verify repeated text passes through
  • Check debug logs for --max-context 128 and --entropy-thold 2.8 in server startup args

@egsok egsok marked this pull request as draft April 5, 2026 09:04
Three layers against hallucination loops:
- --max-context 128 limits cross-segment context to break feedback loops
- --entropy-thold 2.8 discards low-confidence (hallucinated) segments
- Regex dedup removes consecutive repeated phrases (10+ chars) from output

New "Suppress repeated phrases" toggle in Settings → Transcription (default: on).
i18n keys added for all 10 locales.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@egsok egsok force-pushed the feat/anti-repetition-clean branch from af39f9d to cbebebe Compare April 5, 2026 09:07
@egsok egsok marked this pull request as ready for review April 5, 2026 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant