Skip to content

Fix dead triage regex, CLI input validation, and self-grading dogfood loop#62

Closed
Nagendhra-web wants to merge 4 commits into
cobusgreyling:mainfrom
Nagendhra-web:fix/auditor-regex-cli-validation-scoring
Closed

Fix dead triage regex, CLI input validation, and self-grading dogfood loop#62
Nagendhra-web wants to merge 4 commits into
cobusgreyling:mainfrom
Nagendhra-web:fix/auditor-regex-cli-validation-scoring

Conversation

@Nagendhra-web

Copy link
Copy Markdown

Summary

Four focused correctness/maintainability fixes to the tooling and the dogfood workflow, found while reading through the repo. Each is a separate commit so they can be reviewed (or cherry-picked) independently. All package test suites pass: loop-audit 10/10, loop-init 6/6, loop-cost 7/7.

# Area Type Commit
1 loop-audit bug + refactor fix(loop-audit): repair dead triage regex + centralize scoring weights
2 loop-cost bug fix(loop-cost): validate --level instead of silently defaulting to L3
3 loop-init robustness fix(loop-init): reject unknown --pattern / --tool with a clear error
4 daily-triage.yml safety fix(ci): derive daily-triage commit statuses from real gate outcomes

1. loop-audit — dead t riage regex (bug) + magic-number scoring (refactor)

Bug: In detectLoopActivity(), the git-history scan matched the literal t riage (spaces around it and a space inside the word) instead of triage:

// before
if (/state\.md|loop| t riage |changelog-drafter|post-merge|daily triage|audit/i.test(lower)) {
// after
if (/state\.md|loop|triage|changelog-drafter|post-merge|daily triage|audit/i.test(lower)) {

That alternation branch could never match a real commit subject, so plain triage commits were silently not counted as loop activity. It looked fine only because adjacent terms (loop/audit/state.md) masked it — the existing tests asserted the outcome, not the mechanism. Added a regression test that inits a temp git repo with a single triage commit and asserts git evidence is detected (skips gracefully when git is unavailable).

Refactor (no behavior change): the +18/+14/+9… score contributions and the 38/58/78 level cutoffs were inlined as magic numbers, and the L3 threshold (78) was duplicated across computeScore() and auditProject() — easy to desync. Extracted into documented SCORE_WEIGHTS and LEVEL_THRESHOLDS constants. Values are identical; the existing L0/L1/L2/L3 boundary tests pass unmodified, proving parity.

2. loop-cost--level silently defaulted to L3

--level garbage was cast to ReadinessLevel and flowed into realisticMix(), whose if/else has no matching branch and falls through to the L3 return — a confident, wrong estimate. Now the CLI rejects an unknown level with a friendly message, and estimateCost() also guards at the library boundary (throws) so programmatic callers can't hit the silent fallthrough. Added a test.

3. loop-init — unknown --pattern / --tool produced a confusing failure

parseArgs() cast both straight to their union types with no validation, so a typo like --tool emacs flowed into an undefined record lookup. Now validated against the existing PATTERN_STARTERS / TOOL_SUFFIX keys (no duplicated lists) with exit 1 and the list of valid values. Added two CLI tests.

4. daily-triage.yml — the loop marked its own homework green

The daily-triage loop posted validate and audit commit statuses as a hardcoded state: 'success' to satisfy branch protection, then auto-merged its own PR. Since the green statuses were unconditional, the loop could merge itself regardless of whether the real gates passed — the unattended-without-verification failure mode that docs/safety.md and docs/concepts.md (comprehension debt) explicitly warn about. The real gate scripts were already run in the job, but their results were ignored.

This keeps the automation (statuses still need posting because GITHUB_TOKEN-pushed commits don't trigger PR workflows) but makes it honest:

  • The two gate steps now run with continue-on-error + an explicit id; a follow-up step fails the run if either gate failed, so no PR is opened or merged on a red gate.
  • The commit-status step derives each state from the gate's real steps.<id>.outcome instead of a literal 'success'. Green can only be posted when the gate actually passed.

Blast radius was already small (only STATE.md + loop-run-log.md), but this aligns the reference repo's own loop with the verifier discipline it teaches. (Happy to split #4 into a separate PR or downgrade it to an issue if you'd prefer to discuss the workflow change on its own — it's the one opinionated change here.)


Testing

loop-audit:  10 passed   (incl. new triage-commit activity regression test)
loop-init:    6 passed   (incl. 2 new --pattern/--tool validation tests)
loop-cost:    7 passed   (incl. new invalid-level test)

Compiled dist/ is committed alongside each source change (the packages publish from committed dist/, and the test suites import from dist/), so the npm artifacts stay in sync.

🤖 Generated with Claude Code

nagendhra-tech and others added 4 commits June 25, 2026 00:45
Two related correctness/maintainability fixes to the audit engine,
shipped together because they live in the same recompiled source file.

1. Dead `t riage` regex (bug):
   detectLoopActivity()'s git-history scan matched the literal
   ` t riage ` (surrounding spaces + a space inside the word) instead
   of `triage`, so it could never match a real commit subject and
   triage commits were silently not counted as loop activity. The
   branch appeared to "work" only because adjacent alternatives
   (loop/audit/state.md) masked it. Fixed to `triage`, with a
   regression test that inits a temp git repo with a single triage
   commit and asserts git evidence is detected (skips gracefully if
   git is unavailable).

2. Magic scoring weights (maintainability):
   The +18/+14/+9... contributions and the 38/58/78 level cutoffs were
   inlined as magic numbers across computeScore() and auditProject(),
   so the L3 gate threshold was duplicated and easy to desync. Extracted
   into documented SCORE_WEIGHTS and LEVEL_THRESHOLDS constants. Pure
   refactor — values are unchanged and the existing threshold tests
   (L0/L1/L2/L3 boundaries) pass unmodified, proving behavior parity.

All 10 loop-audit tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`--level garbage` was cast straight to ReadinessLevel and flowed into
realisticMix(), whose if/else chain has no matching branch and falls
through to the L3 return — producing a confident, wrong estimate for
an invalid level.

- CLI now rejects an unknown --level with a friendly message + exit 1.
- estimateCost() also guards at the library boundary (throws on an
  invalid level), so callers using it programmatically can't hit the
  silent-fallthrough either.
- Add a test asserting estimateCost rejects an invalid level.

All 7 loop-cost tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
parseArgs() cast --pattern and --tool straight to their union types
with no validation, so a typo (e.g. `--tool emacs`) flowed downstream
into an undefined record lookup and a confusing failure.

- Validate both against the existing PATTERN_STARTERS / TOOL_SUFFIX
  keys (single source of truth, no duplicated lists) and exit 1 with
  the list of valid values.
- Add two CLI tests asserting the friendly errors.

All 6 loop-init tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The daily-triage loop posted `validate` and `audit` commit statuses as
a hardcoded `state: 'success'` to satisfy branch protection, then
auto-merged its own PR. Because the green statuses were unconditional,
the loop structurally "marked its own homework" — exactly the
unattended-without-verification failure mode this repo's own
docs/safety.md and concepts.md (comprehension debt) warn against.

The real gates (ci-validate-gates.sh, ci-audit-gates.sh) were already
run in the job but their results were ignored.

This change keeps the automation (statuses still need posting because
GITHUB_TOKEN-pushed commits don't trigger PR workflows) but makes it
honest:

- The two gate steps now run with continue-on-error and an explicit
  step id; a follow-up step fails the run if either gate failed, so no
  PR is opened or merged on a red gate.
- The commit-status step derives each status's state from the
  corresponding gate's real `steps.<id>.outcome` instead of a literal
  'success'. A green status can now only be posted when the gate
  actually passed.

Blast radius was already small (only STATE.md + loop-run-log.md), but
this aligns the reference repo's own loop with the L1/L2/L3 + verifier
discipline it teaches.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Nagendhra-web Nagendhra-web deleted the fix/auditor-regex-cli-validation-scoring branch June 25, 2026 04:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants