Skip to content

Auto-generated Story Bible (mlmorph entity extraction) #183

@stultus

Description

@stultus

What

A new view (or new export option) — Story Bible: an auto-extracted page that summarises everything the script knows about its world. Updates live as the script does.

Sections:

  • Characters — every Character cue, first-mention scene, total line count, total scenes appeared in.
  • Locations — every unique scene heading location, scenes shot there, INT/EXT split, time-of-day distribution.
  • Props / objects — recurring nouns mentioned in Action lines (filtered by mlmorph POS tag = noun, frequency >= 3).
  • Motifs / theme keywords — high-frequency lemmas (filtered by POS, with stop-word removal) — useful for spotting unintentional repetition.
  • Named entities — proper nouns (places, brands, fictional terms) mentioned but not formal Characters.

Linked: clicking a character / location / prop jumps to its first mention in the script.

Why this matters

A real screenplay accumulates a story-world that's mostly invisible from the editor view. Writers manually maintain "bible" docs in Word or Notion; they go stale immediately. Auto-extraction keeps the bible current.

For series projects this is even more valuable — track recurring props / motifs across episodes.

Dependency

mlmorph for POS tagging on Malayalam tokens. For English Action lines, a simpler English POS tagger (or a stop-words + frequency heuristic for v1).

Technical sketch

  • Walk the active script (or the merged series, when scope=series).
  • For each token (via mlmorph for Malayalam, simpler heuristics for English):
    • Lemmatize, count occurrences.
    • Classify by POS into Characters / Locations / Props / Motifs.
  • Render as a new tab / view, with a "Refresh" button (on-demand recompute, not auto — large scripts make this expensive).
  • CSV / PDF export.

UI

Could live as:

  • A new view tab alongside Writing / Cards / Story (most discoverable).
  • Or a Statistics modal tab.
  • Or an Export option (Story Bible PDF).

Probably all three, layered.

Out of scope

Manual editing of the bible (that's just notes/docs). Predictions about future scenes ("this prop is likely to be a Chekhov's gun") — that's authorial intent, not auto-extractable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions