feat: add custom transcription prompt with language presets#537
feat: add custom transcription prompt with language presets#537egsok wants to merge 1 commit intoOpenWhispr:mainfrom
Conversation
|
hey @gabrielste1n, I've been using this in my fork for a few weeks — the punctuation improvement is dramatic, especially for Russian. Happy to iterate quickly if anything needs adjusting. |
f3d048d to
eb0a350
Compare
JiwaniZakir
left a comment
There was a problem hiding this comment.
The character-based limit in SettingsPage.tsx (maxLength={900} and the 800/900 warning threshold) doesn't account for token density differences across scripts. For the included Japanese (ja) and Chinese (zh-CN, zh-TW) presets, each character typically maps to 2–3 Whisper tokens, meaning a 900-character CJK prompt could easily consume 1800+ tokens — far exceeding Whisper's 224-token initial_prompt window. This undermines the whole premise of the priority ordering in buildTranscriptionPrompt(), where the custom prompt is placed last specifically to survive truncation.
The comment in audioManager.js — "Custom prompt LAST — survives truncation (higher priority)" — is only correct if Whisper truncates from the left, which should be explicitly documented or linked to the Whisper source/docs, since this is a non-obvious assumption that future maintainers could easily break.
A more robust approach would be to enforce a token-based limit (or at least a much lower character limit for CJK locales) rather than a flat 900-character cap that gives a false sense of safety for non-Latin scripts.
…n prompt The 900-character limit was misleading for CJK scripts where each character maps to ~2.2 Whisper tokens, easily exceeding the 224-token initial_prompt window. Replace with estimateTokens() that weights characters by script (CJK ×2.2, Cyrillic ×0.5, Latin ×0.25) and a visual progress bar capped at 112 tokens (~half the window, leaving room for Custom Dictionary). - Shorten all presets to fit within token budget - Add progress bar with color thresholds (gray → yellow → red) - Enforce token budget in onChange instead of maxLength - Update description in all 10 locales to explain shared token budget - Document left-truncation assumption in audioManager.js Addresses review feedback on PR OpenWhispr#537. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thanks for the thorough review @JiwaniZakir — all valid points, now addressed in the latest push: Token-aware budget instead of character limit:
Shortened presets:
Truncation direction documented:
Updated description in all 10 locales to explain the shared token budget with Custom Dictionary. |
Add user-editable "Transcription Prompt" textarea in Settings → Transcription with dropdown presets for 10 languages. Whisper copies the formatting style of this prompt, so a well-punctuated paragraph nudges it to produce punctuated output. - Empty by default (avoids language bias in auto-detect mode) - "Insert preset" dropdown: en, es, fr, de, pt, it, ru, ja, zh-CN, zh-TW - Each preset uses native punctuation (Russian «ёлочки», German „Gänsefüßchen", etc.) - Token-aware budget with estimateTokens() heuristic (CJK ×2.2, Cyrillic ×0.5, Latin ×0.25) — progress bar replaces flat character limit - Budget capped at ~112 tokens (~half of Whisper's 224-token window), leaving room for Custom Dictionary - Dictionary words prepended automatically (truncated first by Whisper's 224-token window; left-truncation documented in code) - i18n: all 10 locales updated Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bbfd3fe to
5382f8e
Compare
Problem
Whisper often drops punctuation entirely, producing a wall of unformatted text — especially for non-English languages. This is a well-known issue in the community:
initial_promptto steer styleThe documented solution: pass a well-punctuated paragraph as
initial_prompt. Whisper doesn't follow instructions — it copies the style of the prompt. A paragraph full of commas, question marks, and em-dashes nudges the decoder to keep producing punctuation.What I tested
I tested this in my fork with a hardcoded Russian punctuation prompt, and it works remarkably well. Russian transcriptions went from completely unpunctuated to properly formatted output with commas, periods, question marks, em-dashes, and quotation marks — consistently, across different recording lengths.
Why not just hardcode a default prompt
The prompt also acts as a language hint. When I had an English prompt hardcoded and the language was set to
auto, Whisper started transcribing Russian speech as English. This is expected behavior — the prompt biases the language detector. So pre-filling a default prompt for all users would break the experience for anyone usingautowith a non-English language.Worth noting: a prompt in one language doesn't prevent Whisper from recognizing words in another. For example, a Russian prompt with language set to "Auto" works fine for mixed Russian/English speech — English words are still transcribed correctly.
Solution
This PR adds a Transcription Prompt textarea in Settings → Transcription — empty by default, with an "Insert preset" dropdown offering punctuation prompts for 10 languages. Users pick a preset matching their language (or write their own), and Whisper starts producing punctuated output.
Implementation details:
initial_promptwindow (dictionary words are lower priority and get truncated first). Truncation direction explicitly documented in code.estimateTokens()heuristic (CJK ×2.2, Cyrillic ×0.5, Latin ×0.25) instead of a flat character limit — prevents CJK users from unknowingly exceeding the token windowWorks with both local whisper.cpp and cloud Whisper API. Does not affect Custom Dictionary behavior.
In the future, we could consider auto-populating the prompt based on the selected transcription language — but I chose not to do that now to avoid breaking the experience for existing users.
Test plan
autolanguage + empty prompt → normal transcription (no language bias)