Fix dead triage regex, CLI input validation, and self-grading dogfood loop#62
Closed
Nagendhra-web wants to merge 4 commits into
Closed
Conversation
Two related correctness/maintainability fixes to the audit engine, shipped together because they live in the same recompiled source file. 1. Dead `t riage` regex (bug): detectLoopActivity()'s git-history scan matched the literal ` t riage ` (surrounding spaces + a space inside the word) instead of `triage`, so it could never match a real commit subject and triage commits were silently not counted as loop activity. The branch appeared to "work" only because adjacent alternatives (loop/audit/state.md) masked it. Fixed to `triage`, with a regression test that inits a temp git repo with a single triage commit and asserts git evidence is detected (skips gracefully if git is unavailable). 2. Magic scoring weights (maintainability): The +18/+14/+9... contributions and the 38/58/78 level cutoffs were inlined as magic numbers across computeScore() and auditProject(), so the L3 gate threshold was duplicated and easy to desync. Extracted into documented SCORE_WEIGHTS and LEVEL_THRESHOLDS constants. Pure refactor — values are unchanged and the existing threshold tests (L0/L1/L2/L3 boundaries) pass unmodified, proving behavior parity. All 10 loop-audit tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`--level garbage` was cast straight to ReadinessLevel and flowed into realisticMix(), whose if/else chain has no matching branch and falls through to the L3 return — producing a confident, wrong estimate for an invalid level. - CLI now rejects an unknown --level with a friendly message + exit 1. - estimateCost() also guards at the library boundary (throws on an invalid level), so callers using it programmatically can't hit the silent-fallthrough either. - Add a test asserting estimateCost rejects an invalid level. All 7 loop-cost tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
parseArgs() cast --pattern and --tool straight to their union types with no validation, so a typo (e.g. `--tool emacs`) flowed downstream into an undefined record lookup and a confusing failure. - Validate both against the existing PATTERN_STARTERS / TOOL_SUFFIX keys (single source of truth, no duplicated lists) and exit 1 with the list of valid values. - Add two CLI tests asserting the friendly errors. All 6 loop-init tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The daily-triage loop posted `validate` and `audit` commit statuses as a hardcoded `state: 'success'` to satisfy branch protection, then auto-merged its own PR. Because the green statuses were unconditional, the loop structurally "marked its own homework" — exactly the unattended-without-verification failure mode this repo's own docs/safety.md and concepts.md (comprehension debt) warn against. The real gates (ci-validate-gates.sh, ci-audit-gates.sh) were already run in the job but their results were ignored. This change keeps the automation (statuses still need posting because GITHUB_TOKEN-pushed commits don't trigger PR workflows) but makes it honest: - The two gate steps now run with continue-on-error and an explicit step id; a follow-up step fails the run if either gate failed, so no PR is opened or merged on a red gate. - The commit-status step derives each status's state from the corresponding gate's real `steps.<id>.outcome` instead of a literal 'success'. A green status can now only be posted when the gate actually passed. Blast radius was already small (only STATE.md + loop-run-log.md), but this aligns the reference repo's own loop with the L1/L2/L3 + verifier discipline it teaches. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four focused correctness/maintainability fixes to the tooling and the dogfood workflow, found while reading through the repo. Each is a separate commit so they can be reviewed (or cherry-picked) independently. All package test suites pass: loop-audit 10/10, loop-init 6/6, loop-cost 7/7.
loop-auditfix(loop-audit): repair dead triage regex + centralize scoring weightsloop-costfix(loop-cost): validate --level instead of silently defaulting to L3loop-initfix(loop-init): reject unknown --pattern / --tool with a clear errordaily-triage.ymlfix(ci): derive daily-triage commit statuses from real gate outcomes1.
loop-audit— deadt riageregex (bug) + magic-number scoring (refactor)Bug: In
detectLoopActivity(), the git-history scan matched the literalt riage(spaces around it and a space inside the word) instead oftriage:That alternation branch could never match a real commit subject, so plain
triagecommits were silently not counted as loop activity. It looked fine only because adjacent terms (loop/audit/state.md) masked it — the existing tests asserted the outcome, not the mechanism. Added a regression test that inits a temp git repo with a singletriagecommit and asserts git evidence is detected (skips gracefully when git is unavailable).Refactor (no behavior change): the
+18/+14/+9…score contributions and the38/58/78level cutoffs were inlined as magic numbers, and the L3 threshold (78) was duplicated acrosscomputeScore()andauditProject()— easy to desync. Extracted into documentedSCORE_WEIGHTSandLEVEL_THRESHOLDSconstants. Values are identical; the existing L0/L1/L2/L3 boundary tests pass unmodified, proving parity.2.
loop-cost—--levelsilently defaulted to L3--level garbagewas cast toReadinessLeveland flowed intorealisticMix(), whose if/else has no matching branch and falls through to the L3 return — a confident, wrong estimate. Now the CLI rejects an unknown level with a friendly message, andestimateCost()also guards at the library boundary (throws) so programmatic callers can't hit the silent fallthrough. Added a test.3.
loop-init— unknown--pattern/--toolproduced a confusing failureparseArgs()cast both straight to their union types with no validation, so a typo like--tool emacsflowed into an undefined record lookup. Now validated against the existingPATTERN_STARTERS/TOOL_SUFFIXkeys (no duplicated lists) withexit 1and the list of valid values. Added two CLI tests.4.
daily-triage.yml— the loop marked its own homework greenThe daily-triage loop posted
validateandauditcommit statuses as a hardcodedstate: 'success'to satisfy branch protection, then auto-merged its own PR. Since the green statuses were unconditional, the loop could merge itself regardless of whether the real gates passed — the unattended-without-verification failure mode thatdocs/safety.mdanddocs/concepts.md(comprehension debt) explicitly warn about. The real gate scripts were already run in the job, but their results were ignored.This keeps the automation (statuses still need posting because
GITHUB_TOKEN-pushed commits don't trigger PR workflows) but makes it honest:continue-on-error+ an explicitid; a follow-up step fails the run if either gate failed, so no PR is opened or merged on a red gate.statefrom the gate's realsteps.<id>.outcomeinstead of a literal'success'. Green can only be posted when the gate actually passed.Blast radius was already small (only
STATE.md+loop-run-log.md), but this aligns the reference repo's own loop with the verifier discipline it teaches. (Happy to split #4 into a separate PR or downgrade it to an issue if you'd prefer to discuss the workflow change on its own — it's the one opinionated change here.)Testing
Compiled
dist/is committed alongside each source change (the packages publish from committeddist/, and the test suites import fromdist/), so the npm artifacts stay in sync.🤖 Generated with Claude Code