Character voice distinctiveness panel

## What

A new tab in the Statistics modal: **Voice distinctiveness**. Per character:
- Lemma vocabulary size (unique lemmas they use)
- Sentence-length distribution (median, p90)
- Register markers — formal vs colloquial, based on POS tags from mlmorph
- Preferred verb tenses / aspect markers
- A "voice similarity matrix" — how much character A's distribution overlaps with character B's

## Why this matters

A common quality problem in screenplays: every character sounds the same — same vocabulary range, same sentence rhythm, same register. Writers often don't notice until a script reader points it out.

This panel surfaces it visually. *"Your protagonist and antagonist share 92% of their lemma distribution — they sound alike."* The writer can then deliberately differentiate.

Final Draft, Highland, etc. ship character dialogue counts. None ship grammatical-distinctiveness analysis.

## Dependency

mlmorph FST integration. Once the analyzer runs, the per-line lemma + POS extraction is a straightforward walk.

## Technical sketch

- Walk the active script, group lines by Character cue.
- For each character's accumulated dialogue:
  - Tokenize via `malayalam-tokenizer` (already a Rust crate at github.com/smc/malayalam-tokenizer, MIT).
  - Analyze each token via mlmorph → lemma + tag set.
  - Aggregate: Set of unique lemmas, histogram of sentence lengths, frequency of each POS tag.
- Render in StatisticsModal: a table of characters × metrics, plus a small heatmap for the similarity matrix.
- CSV export per the existing pattern.

## Out of scope

English-side analysis (Latin tokens). English has its own NLP stack — out of scope for v1.0+. Only Malayalam dialogue is analyzed; characters who speak only English get a "—" in their row.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Character voice distinctiveness panel #180

What

Why this matters

Dependency

Technical sketch

Out of scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Character voice distinctiveness panel #180

Description

What

Why this matters

Dependency

Technical sketch

Out of scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions