Skip to content

Batch out-of-process CheckLibrary runs (collection-scoped) #68

Description

@abegong

Background

The CheckLibrary spec and plan introduce CheckLibrary as the abstraction behind every check type, with a SchemaLibrary capability and an out-of-process invocation seam (Vale being the motivating example).

The first cut runs an out-of-process library per item: the engine launches the external process once per file. That is the simplest correct path and matches how every per-item check works today, but it pays process-startup cost on every file. Over a large collection that is wasteful.

What to build

Run an out-of-process library once per collection instead of once per item:

  • Gather every item in the collection, invoke the external tool a single time over all of them (e.g. vale --output JSON <files>), and map each finding back to its file via Violation.File.
  • Reuse the existing collection-scoped check pass (checks.CollectionCheck / RunCollection, engine.collectionChecksFor, the whole-collection re-scan in cmd/check.go) rather than inventing a new lifecycle.
  • Accept that a single-item selector (katalyst check notes/foo) triggers a whole-collection run, which already holds for collection-scoped checks like filesystem_unique_filename.

Notes

  • Depends on the CheckLibrary abstraction landing first, and realistically on the Vale library existing as the first out-of-process consumer.
  • Spec Open Question 5 records the per-item-first / batch-later decision and the tradeoff.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions