Fix triage regex, CLI validation, and the self-approving daily-triage loop#63
Open
Nagendhra-web wants to merge 4 commits into
Open
Fix triage regex, CLI validation, and the self-approving daily-triage loop#63Nagendhra-web wants to merge 4 commits into
Nagendhra-web wants to merge 4 commits into
Conversation
Two related fixes to the audit engine, shipped together because they live in the same recompiled source file. 1. Dead triage regex (bug) The git-history scan in detectLoopActivity matched the literal token " t riage " (spaces around it and a space inside the word) rather than "triage". That alternation branch could never match a real commit subject, so triage commits were silently not counted as loop activity. It looked correct only because adjacent terms (loop, audit, state.md) masked it. The fix changes the pattern to "triage" and adds a regression test that initializes a temporary git repo with a single triage commit and asserts that git evidence is detected. The test skips gracefully when git is unavailable. 2. Magic scoring weights (maintainability) The score contributions (18, 14, 9, and so on) and the level cutoffs (38, 58, 78) were inlined as magic numbers, and the L3 threshold was duplicated across computeScore and auditProject, which made it easy to desync. They are now extracted into documented SCORE_WEIGHTS and LEVEL_THRESHOLDS constants. This is a pure refactor: the values are unchanged and the existing boundary tests pass without modification, which proves behavior parity. All 10 loop-audit tests pass.
Passing an invalid level such as "--level garbage" was cast straight to ReadinessLevel and flowed into realisticMix, whose if/else chain has no matching branch and falls through to the L3 return. The result was a confident but wrong estimate for an invalid level. The CLI now rejects an unknown level with a clear message and exit code 1. estimateCost also guards at the library boundary and throws on an invalid level, so callers using it programmatically cannot hit the silent fallthrough either. A test asserts that estimateCost rejects an invalid level. All 7 loop-cost tests pass.
parseArgs cast --pattern and --tool straight to their union types with no validation, so a typo such as "--tool emacs" flowed downstream into an undefined record lookup and a confusing failure. Both values are now validated against the existing PATTERN_STARTERS and TOOL_SUFFIX keys, which keeps a single source of truth with no duplicated lists, and the CLI exits with code 1 and the list of valid values. Two CLI tests assert the friendly errors. All 6 loop-init tests pass.
The daily-triage loop posted the validate and audit commit statuses as a hardcoded success state to satisfy branch protection, then auto-merged its own PR. Because the green statuses were unconditional, the loop could merge itself regardless of whether the real gates passed. That is the unattended-without-verification failure mode that docs/safety.md and docs/concepts.md (comprehension debt) explicitly warn against. The real gate scripts (ci-validate-gates.sh and ci-audit-gates.sh) were already run in the job, but their results were ignored. This change keeps the automation, since statuses still need to be posted because GITHUB_TOKEN-pushed commits do not trigger PR workflows, but it makes the automation honest: - The two gate steps now run with continue-on-error and an explicit step id. A follow-up step fails the run if either gate failed, so no PR is opened or merged on a red gate. - The commit-status step derives each status state from the gate's real step outcome instead of a literal success value. A green status can now only be posted when the gate actually passed. The blast radius was already small, limited to STATE.md and loop-run-log.md, but this aligns the reference repo's own loop with the verifier discipline it teaches.
62d2d41 to
e9f1fcf
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
While reading through the tooling I found three small bugs and one workflow issue. This PR fixes all four. Each one is a separate commit so they can be reviewed or cherry-picked on their own. Every package test suite passes: loop-audit 10/10, loop-init 6/6, loop-cost 7/7.
1. loop-audit: dead triage regex, plus a scoring-weights cleanup
In
detectLoopActivity, the git history scan matched the literal token" t riage "(spaces around the word, and a space inside it) instead oftriage:That branch can never match a real commit subject, so commits whose only loop signal is the word
triagewere not being counted as activity. It looked fine because the other terms in the alternation (loop,audit,state.md) covered most real cases, and the existing tests checked the final result rather than this specific branch. I added a regression test that creates a temporary git repo with a singletriagecommit and asserts that git evidence is detected. It skips cleanly if git is not available.While I was in the file I also moved the scoring weights out of the function body. The score contributions (18, 14, 9, and so on) and the level cutoffs (38, 58, 78) were inline magic numbers, and the L3 threshold was written out twice, once in
computeScoreand once inauditProject, which is easy to get out of sync. They now live in documentedSCORE_WEIGHTSandLEVEL_THRESHOLDSconstants. This part is a pure refactor. The numbers are identical and the existing boundary tests pass unchanged.2. loop-cost: an invalid level silently became L3
Passing something like
--level garbagewas cast toReadinessLeveland handed torealisticMix. That function has no matching branch for an unknown level, so it falls through to the L3 return and produces a confident but wrong estimate. The CLI now rejects an unknown level with a clear message and a non-zero exit.estimateCostalso checks the level at the library boundary and throws, so code that calls it directly cannot hit the same silent fallthrough. Added a test for the rejection.3. loop-init: unknown pattern or tool gave a confusing error
parseArgscast--patternand--toolstraight to their union types with no validation, so a typo like--tool emacsflowed into an undefined lookup later and failed in a way that did not point at the real problem. Both are now validated against the existingPATTERN_STARTERSandTOOL_SUFFIXkeys, so there is still a single source of truth, and the CLI exits with the list of valid values. Added two tests.4. daily-triage.yml: the loop was approving its own merge
The daily-triage workflow posted the
validateandauditcommit statuses as a hardcodedsuccessto satisfy branch protection, then auto-merged its own PR. Since those statuses were unconditional, the loop could merge itself whether or not the real gates passed. The real gate scripts were already being run in the job, but their results were not used. This is the kind of unattended change without verification thatdocs/safety.mdanddocs/concepts.mdwarn about, so it felt worth fixing in the reference repo itself.The change keeps the automation, since the statuses still need to be posted (a
GITHUB_TOKENpush does not trigger the PR workflows), but ties them to reality:continue-on-errorand an explicitid, and a follow-up step fails the run if either gate failed. So a red gate stops the PR from being opened or merged.success, so a green status can only be posted when the gate actually passed.The blast radius here was already small, only
STATE.mdandloop-run-log.md, but it lines the repo's own loop up with the verifier discipline the docs describe. If you would rather discuss this one separately, I am happy to pull it into its own PR or open an issue instead.Testing
The compiled
dist/output is committed next to each source change, since the packages publish from committeddist/and the test suites import from it, so the build artifacts stay in sync with the source.