Skip to content

feat: add custom transcription prompt with language presets#537

Open
egsok wants to merge 1 commit intoOpenWhispr:mainfrom
egsok:feat/custom-transcription-prompt
Open

feat: add custom transcription prompt with language presets#537
egsok wants to merge 1 commit intoOpenWhispr:mainfrom
egsok:feat/custom-transcription-prompt

Conversation

@egsok
Copy link
Copy Markdown
Contributor

@egsok egsok commented Apr 1, 2026

Problem

Whisper often drops punctuation entirely, producing a wall of unformatted text — especially for non-English languages. This is a well-known issue in the community:

The documented solution: pass a well-punctuated paragraph as initial_prompt. Whisper doesn't follow instructions — it copies the style of the prompt. A paragraph full of commas, question marks, and em-dashes nudges the decoder to keep producing punctuation.

What I tested

I tested this in my fork with a hardcoded Russian punctuation prompt, and it works remarkably well. Russian transcriptions went from completely unpunctuated to properly formatted output with commas, periods, question marks, em-dashes, and quotation marks — consistently, across different recording lengths.

Why not just hardcode a default prompt

The prompt also acts as a language hint. When I had an English prompt hardcoded and the language was set to auto, Whisper started transcribing Russian speech as English. This is expected behavior — the prompt biases the language detector. So pre-filling a default prompt for all users would break the experience for anyone using auto with a non-English language.

Worth noting: a prompt in one language doesn't prevent Whisper from recognizing words in another. For example, a Russian prompt with language set to "Auto" works fine for mixed Russian/English speech — English words are still transcribed correctly.

Solution

This PR adds a Transcription Prompt textarea in Settings → Transcription — empty by default, with an "Insert preset" dropdown offering punctuation prompts for 10 languages. Users pick a preset matching their language (or write their own), and Whisper starts producing punctuated output.

image image

Implementation details:

  • Empty by default — no language bias for existing users
  • Presets for: English, Spanish, French, German, Portuguese, Italian, Russian, Japanese, Chinese Simplified/Traditional
  • Each preset uses native punctuation conventions (Russian «ёлочки», German „Gänsefüßchen", French « guillemets », Japanese 「括弧」, Chinese ""引号"")
  • Custom prompt placed after dictionary words in the combined prompt — survives Whisper's left-truncation of the 224-token initial_prompt window (dictionary words are lower priority and get truncated first). Truncation direction explicitly documented in code.
  • Token-aware budget with estimateTokens() heuristic (CJK ×2.2, Cyrillic ×0.5, Latin ×0.25) instead of a flat character limit — prevents CJK users from unknowingly exceeding the token window
  • Visual progress bar (gray → yellow at 80% → red at 95%) capped at ~112 tokens (~half the window, leaving room for Custom Dictionary)
  • Description explains how the prompt shares a token budget with Custom Dictionary
  • i18n: all 10 locales updated

Works with both local whisper.cpp and cloud Whisper API. Does not affect Custom Dictionary behavior.

In the future, we could consider auto-populating the prompt based on the selected transcription language — but I chose not to do that now to avoid breaking the experience for existing users.

Test plan

  • Settings → Transcription → verify textarea appears empty with placeholder text
  • Click "Insert preset" → select a language → prompt inserted, progress bar shows ≤69%
  • Type Latin text → progress bar fills slowly (~0.25 tokens/char)
  • Type CJK text → progress bar fills ~9× faster (~2.2 tokens/char)
  • Hit 100% → cannot type more characters; delete text → bar decreases, can type again
  • Progress bar color: gray by default, yellow at 80%+, red at 95%+
  • Dictate with auto language + empty prompt → normal transcription (no language bias)
  • Dictate with specific language + matching preset → improved punctuation
  • Clear prompt → dictate → only dictionary words sent (if any)
  • Restart app → prompt persisted in localStorage

@egsok
Copy link
Copy Markdown
Contributor Author

egsok commented Apr 1, 2026

hey @gabrielste1n, I've been using this in my fork for a few weeks — the punctuation improvement is dramatic, especially for Russian. Happy to iterate quickly if anything needs adjusting.

@egsok egsok force-pushed the feat/custom-transcription-prompt branch from f3d048d to eb0a350 Compare April 3, 2026 07:23
Copy link
Copy Markdown

@JiwaniZakir JiwaniZakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The character-based limit in SettingsPage.tsx (maxLength={900} and the 800/900 warning threshold) doesn't account for token density differences across scripts. For the included Japanese (ja) and Chinese (zh-CN, zh-TW) presets, each character typically maps to 2–3 Whisper tokens, meaning a 900-character CJK prompt could easily consume 1800+ tokens — far exceeding Whisper's 224-token initial_prompt window. This undermines the whole premise of the priority ordering in buildTranscriptionPrompt(), where the custom prompt is placed last specifically to survive truncation.

The comment in audioManager.js — "Custom prompt LAST — survives truncation (higher priority)" — is only correct if Whisper truncates from the left, which should be explicitly documented or linked to the Whisper source/docs, since this is a non-obvious assumption that future maintainers could easily break.

A more robust approach would be to enforce a token-based limit (or at least a much lower character limit for CJK locales) rather than a flat 900-character cap that gives a false sense of safety for non-Latin scripts.

egsok added a commit to egsok/openwhispr-custom that referenced this pull request Apr 5, 2026
…n prompt

The 900-character limit was misleading for CJK scripts where each character
maps to ~2.2 Whisper tokens, easily exceeding the 224-token initial_prompt
window. Replace with estimateTokens() that weights characters by script
(CJK ×2.2, Cyrillic ×0.5, Latin ×0.25) and a visual progress bar capped
at 112 tokens (~half the window, leaving room for Custom Dictionary).

- Shorten all presets to fit within token budget
- Add progress bar with color thresholds (gray → yellow → red)
- Enforce token budget in onChange instead of maxLength
- Update description in all 10 locales to explain shared token budget
- Document left-truncation assumption in audioManager.js

Addresses review feedback on PR OpenWhispr#537.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@egsok
Copy link
Copy Markdown
Contributor Author

egsok commented Apr 5, 2026

Thanks for the thorough review @JiwaniZakir — all valid points, now addressed in the latest push:

Token-aware budget instead of character limit:

  • Added estimateTokens() heuristic that approximates Whisper token count by weighting characters by script: CJK ×2.2, Cyrillic ×0.5, Latin ×0.25. This is a lightweight approximation (not a real tokenizer) but sufficient to prevent CJK users from unknowingly blowing past the 224-token window.
  • Replaced maxLength={900} with an approximate budget of ~112 tokens (~half of Whisper's 224-token window, leaving room for Custom Dictionary)
  • Visual progress bar with color thresholds (gray → yellow at 80% → red at 95%) instead of the raw character counter

Shortened presets:

  • All presets now fit within the token budget
  • CJK presets trimmed significantly while preserving full punctuation variety (!?「」""。,——)

Truncation direction documented:

  • Updated the comment in audioManager.js to explicitly state that Whisper truncates initial_prompt from the left (keeping rightmost tokens), with a reference to whisper.cpp's tokenize logic

Updated description in all 10 locales to explain the shared token budget with Custom Dictionary.

Add user-editable "Transcription Prompt" textarea in Settings →
Transcription with dropdown presets for 10 languages. Whisper copies
the formatting style of this prompt, so a well-punctuated paragraph
nudges it to produce punctuated output.

- Empty by default (avoids language bias in auto-detect mode)
- "Insert preset" dropdown: en, es, fr, de, pt, it, ru, ja, zh-CN, zh-TW
- Each preset uses native punctuation (Russian «ёлочки», German „Gänsefüßchen", etc.)
- Token-aware budget with estimateTokens() heuristic (CJK ×2.2, Cyrillic ×0.5,
  Latin ×0.25) — progress bar replaces flat character limit
- Budget capped at ~112 tokens (~half of Whisper's 224-token window),
  leaving room for Custom Dictionary
- Dictionary words prepended automatically (truncated first by Whisper's
  224-token window; left-truncation documented in code)
- i18n: all 10 locales updated

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@egsok egsok force-pushed the feat/custom-transcription-prompt branch from bbfd3fe to 5382f8e Compare April 5, 2026 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants