What
Train a tiny per-character n-gram model on each character's existing dialogue. When the writer is composing new lines for character X, autocomplete suggestions come from that character's distribution — not the document-wide pool.
Why this matters
Helps the writer maintain voice consistency across a 90-page draft. Once a character has 30+ established lines, the predictive autocomplete starts surfacing words / phrasings they actually use, not generic Malayalam.
Combined with #180 (Voice distinctiveness panel), the writer gets both a passive check (how distinctive is this character's voice?) and an active aid (suggestions that reinforce that voice).
Reference
SMC ships a Malayalam predictor at https://github.com/smc/mlpredict — n-gram-based, JS-port available. We don't need to use mlpredict directly; we just need a tiny in-Rust n-gram (bigram + trigram) trained on the script's per-character text.
Technical sketch
- Extend the existing
wordCompletion.ts plugin with a per-character mode.
- When the cursor is in a Dialogue block under Character cue X:
- If character X has >= 30 dialogue lines, build a bigram model from their text.
- Predict the most likely next word given the previous 1-2 tokens.
- Surface as the top entries in the suggestion popover, mixed with the document-local pool.
- Recompute the per-character model lazily (only when the character's dialogue count changes meaningfully — debounced).
UI
A small "From this character's voice" header in the popover when per-character suggestions are surfacing, vs "From the document" for the existing pool.
Out of scope
Training on external corpora (other Malayalam screenplays — we don't have a corpus). Per-character models are purely script-local.
What
Train a tiny per-character n-gram model on each character's existing dialogue. When the writer is composing new lines for character X, autocomplete suggestions come from that character's distribution — not the document-wide pool.
Why this matters
Helps the writer maintain voice consistency across a 90-page draft. Once a character has 30+ established lines, the predictive autocomplete starts surfacing words / phrasings they actually use, not generic Malayalam.
Combined with #180 (Voice distinctiveness panel), the writer gets both a passive check (how distinctive is this character's voice?) and an active aid (suggestions that reinforce that voice).
Reference
SMC ships a Malayalam predictor at https://github.com/smc/mlpredict — n-gram-based, JS-port available. We don't need to use mlpredict directly; we just need a tiny in-Rust n-gram (bigram + trigram) trained on the script's per-character text.
Technical sketch
wordCompletion.tsplugin with a per-character mode.UI
A small "From this character's voice" header in the popover when per-character suggestions are surfacing, vs "From the document" for the existing pool.
Out of scope
Training on external corpora (other Malayalam screenplays — we don't have a corpus). Per-character models are purely script-local.