Skip to content

Use LLM / ML classifiers for judgment-call checks #57

Description

@abegong

Summary

Some checks are inherently judgment calls, not pattern matches. The
markdown_writing_tells check (added for #47) is the first example: it greps
for AI-writing tells (em dashes, decorative emoji, overused words, stock
phrases) and emits warnings for a human to review, because whether a given
hit is actually slop depends on context. A regex catalog can only surface
candidates; it cannot decide.

We should explore backing this class of check with an LLM call or a trained
ML classifier so the tool can make the judgment itself (or rank candidates by
confidence) instead of leaving every hit for manual review.

Motivation

  • The current markdown_writing_tells catalog is high-recall, low-precision:
    it flags legitimate technical words ("robust", "harness"), the sanctioned
    ⚠️ callout icon, and even the style guide that documents the tells. Most
    warnings are noise a reader has to dismiss.
  • A classifier could score each candidate ("is this em dash / phrase actually
    AI slop in context?") and only warn above a threshold, or explain why.
  • The same approach generalizes to other fuzzy checks: tone/voice
    consistency, "does this section match its heading", summary quality, etc.

Possible directions

  • An optional check kind that calls an LLM (pluggable provider, prompt
    carries the line + surrounding context) and returns warnings with a
    rationale. Default Claude per the repo's conventions.
  • A small local classifier for the punctuation/phrase tells to avoid a
    network call on every check.
  • Keep it strictly warning severity and opt-in: an LLM-backed check must
    never fail CI deterministically, and must degrade gracefully when no
    provider/credentials are configured (skip, don't error).
  • Caching by content hash so repeated runs over unchanged files are free.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions