Skip to content

User defined Vocabulary Layer #7

@sortedcord

Description

@sortedcord

Add a personal vocabulary system that lets users define shorthand aliases for anything Coggle can understand such as paths, argument combinations, or intent shortcuts. This layer runs before the preprocessor so all downstream pipeline stages see the expanded form.

The path preprocessor handles tokens that exist on the filesystem by name so if you have a folder called downloads, Coggle can figure it out. But it cannot handle:

  • Paths whose names don't match how the user refers to them ("my nas" -> /media/external/nas)
  • Multi word references ("project folder" -> ~/dev/coggle)
  • Shorthand for argument combinations ("web format" -> mp4, h264, 1080p)
  • Custom intent aliases for frequently run operations

A user defined vocabulary layer closes this gap without requiring any model changes or heuristic additions to the preprocessor.

Proposed Behaviour

Coggle loads a vocabulary file (eg ~/.config/coggle/vocab.toml) at startup. Before any pipeline stage runs, the raw query is scanned for vocabulary keys and expanded in place. The expanded query is then passed to the preprocessor as if the user had typed the full form.

The vocab file may look like this

[paths]
"my nas"       = "/media/external/nas"
"project"      = "~/dev/coggle"
"downloads"    = "~/Downloads"
"desktop"      = "~/Desktop"

[arguments]
"web format"   = "mp4 h264 1080p"
"small"        = "720p"
"lossless"     = "flac"

[intents]
"stash"        = "move"
"nuke"         = "delete"

We may not need this categorization inside of the vocab file but it may be useful down the line.

Expansion Rules

  • Expansion is a plain string substitution on the raw query, case insensitive, longest match first to avoid partial collisions
  • Multi word keys are matched before single word keys
  • Expanded tokens are then handled by the preprocessor as normal. For example, a path alias expands to a string the preprocessor will confirm against the filesystem. An argument alias expands to tokens the subclassifier will parse. (This flow is subject to change)

Scope

  • Define vocabulary file schema (TOML, sections by type)
  • Implement vocabulary loader with validation
  • Implement longest match first sub on query
  • CLI command to add/remove/list vocabulary entries (coggle vocab add "my nas" /media/external/nas)
  • Document vocabulary file location and format

Out of Scope (for now)

  • Per project vocabulary files
  • Vocabulary entries that map to structured objects rather than raw strings (for example mapping directly to a resolved PathContext)
  • Fuzzy matching on vocabulary keys
  • Auto adding vocabulary intents

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions