Summary
Some checks are inherently judgment calls, not pattern matches. The
markdown_writing_tells check (added for #47) is the first example: it greps
for AI-writing tells (em dashes, decorative emoji, overused words, stock
phrases) and emits warnings for a human to review, because whether a given
hit is actually slop depends on context. A regex catalog can only surface
candidates; it cannot decide.
We should explore backing this class of check with an LLM call or a trained
ML classifier so the tool can make the judgment itself (or rank candidates by
confidence) instead of leaving every hit for manual review.
Motivation
- The current
markdown_writing_tells catalog is high-recall, low-precision:
it flags legitimate technical words ("robust", "harness"), the sanctioned
⚠️ callout icon, and even the style guide that documents the tells. Most
warnings are noise a reader has to dismiss.
- A classifier could score each candidate ("is this em dash / phrase actually
AI slop in context?") and only warn above a threshold, or explain why.
- The same approach generalizes to other fuzzy checks: tone/voice
consistency, "does this section match its heading", summary quality, etc.
Possible directions
- An optional check kind that calls an LLM (pluggable provider, prompt
carries the line + surrounding context) and returns warnings with a
rationale. Default Claude per the repo's conventions.
- A small local classifier for the punctuation/phrase tells to avoid a
network call on every check.
- Keep it strictly warning severity and opt-in: an LLM-backed check must
never fail CI deterministically, and must degrade gracefully when no
provider/credentials are configured (skip, don't error).
- Caching by content hash so repeated runs over unchanged files are free.
Related
Summary
Some checks are inherently judgment calls, not pattern matches. The
markdown_writing_tellscheck (added for #47) is the first example: it grepsfor AI-writing tells (em dashes, decorative emoji, overused words, stock
phrases) and emits warnings for a human to review, because whether a given
hit is actually slop depends on context. A regex catalog can only surface
candidates; it cannot decide.
We should explore backing this class of check with an LLM call or a trained
ML classifier so the tool can make the judgment itself (or rank candidates by
confidence) instead of leaving every hit for manual review.
Motivation
markdown_writing_tellscatalog is high-recall, low-precision:it flags legitimate technical words ("robust", "harness"), the sanctioned
⚠️callout icon, and even the style guide that documents the tells. Mostwarnings are noise a reader has to dismiss.
AI slop in context?") and only warn above a threshold, or explain why.
consistency, "does this section match its heading", summary quality, etc.
Possible directions
carries the line + surrounding context) and returns warnings with a
rationale. Default Claude per the repo's conventions.
network call on every
check.never fail CI deterministically, and must degrade gracefully when no
provider/credentials are configured (skip, don't error).
Related
markdown_writing_tellsand the warning-severity concept this builds on.