Skip to content

Character voice distinctiveness panel #180

@stultus

Description

@stultus

What

A new tab in the Statistics modal: Voice distinctiveness. Per character:

  • Lemma vocabulary size (unique lemmas they use)
  • Sentence-length distribution (median, p90)
  • Register markers — formal vs colloquial, based on POS tags from mlmorph
  • Preferred verb tenses / aspect markers
  • A "voice similarity matrix" — how much character A's distribution overlaps with character B's

Why this matters

A common quality problem in screenplays: every character sounds the same — same vocabulary range, same sentence rhythm, same register. Writers often don't notice until a script reader points it out.

This panel surfaces it visually. "Your protagonist and antagonist share 92% of their lemma distribution — they sound alike." The writer can then deliberately differentiate.

Final Draft, Highland, etc. ship character dialogue counts. None ship grammatical-distinctiveness analysis.

Dependency

mlmorph FST integration. Once the analyzer runs, the per-line lemma + POS extraction is a straightforward walk.

Technical sketch

  • Walk the active script, group lines by Character cue.
  • For each character's accumulated dialogue:
    • Tokenize via malayalam-tokenizer (already a Rust crate at github.com/smc/malayalam-tokenizer, MIT).
    • Analyze each token via mlmorph → lemma + tag set.
    • Aggregate: Set of unique lemmas, histogram of sentence lengths, frequency of each POS tag.
  • Render in StatisticsModal: a table of characters × metrics, plus a small heatmap for the similarity matrix.
  • CSV export per the existing pattern.

Out of scope

English-side analysis (Latin tokens). English has its own NLP stack — out of scope for v1.0+. Only Malayalam dialogue is analyzed; characters who speak only English get a "—" in their row.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions