Skip to content

Fix triage regex, CLI validation, and the self-approving daily-triage loop#63

Open
Nagendhra-web wants to merge 4 commits into
cobusgreyling:mainfrom
Nagendhra-web:fix/loop-tooling-bugs
Open

Fix triage regex, CLI validation, and the self-approving daily-triage loop#63
Nagendhra-web wants to merge 4 commits into
cobusgreyling:mainfrom
Nagendhra-web:fix/loop-tooling-bugs

Conversation

@Nagendhra-web

Copy link
Copy Markdown

Summary

While reading through the tooling I found three small bugs and one workflow issue. This PR fixes all four. Each one is a separate commit so they can be reviewed or cherry-picked on their own. Every package test suite passes: loop-audit 10/10, loop-init 6/6, loop-cost 7/7.

Area Type What changed
loop-audit bug + refactor Repair a dead regex in activity detection; move scoring weights into named constants
loop-cost bug Validate the readiness level instead of silently treating an invalid one as L3
loop-init robustness Reject an unknown pattern or tool with a clear error instead of a confusing crash
daily-triage.yml safety Post commit statuses from the real gate outcomes instead of a hardcoded success

1. loop-audit: dead triage regex, plus a scoring-weights cleanup

In detectLoopActivity, the git history scan matched the literal token " t riage " (spaces around the word, and a space inside it) instead of triage:

// before
if (/state\.md|loop| t riage |changelog-drafter|post-merge|daily triage|audit/i.test(lower)) {
// after
if (/state\.md|loop|triage|changelog-drafter|post-merge|daily triage|audit/i.test(lower)) {

That branch can never match a real commit subject, so commits whose only loop signal is the word triage were not being counted as activity. It looked fine because the other terms in the alternation (loop, audit, state.md) covered most real cases, and the existing tests checked the final result rather than this specific branch. I added a regression test that creates a temporary git repo with a single triage commit and asserts that git evidence is detected. It skips cleanly if git is not available.

While I was in the file I also moved the scoring weights out of the function body. The score contributions (18, 14, 9, and so on) and the level cutoffs (38, 58, 78) were inline magic numbers, and the L3 threshold was written out twice, once in computeScore and once in auditProject, which is easy to get out of sync. They now live in documented SCORE_WEIGHTS and LEVEL_THRESHOLDS constants. This part is a pure refactor. The numbers are identical and the existing boundary tests pass unchanged.

2. loop-cost: an invalid level silently became L3

Passing something like --level garbage was cast to ReadinessLevel and handed to realisticMix. That function has no matching branch for an unknown level, so it falls through to the L3 return and produces a confident but wrong estimate. The CLI now rejects an unknown level with a clear message and a non-zero exit. estimateCost also checks the level at the library boundary and throws, so code that calls it directly cannot hit the same silent fallthrough. Added a test for the rejection.

3. loop-init: unknown pattern or tool gave a confusing error

parseArgs cast --pattern and --tool straight to their union types with no validation, so a typo like --tool emacs flowed into an undefined lookup later and failed in a way that did not point at the real problem. Both are now validated against the existing PATTERN_STARTERS and TOOL_SUFFIX keys, so there is still a single source of truth, and the CLI exits with the list of valid values. Added two tests.

4. daily-triage.yml: the loop was approving its own merge

The daily-triage workflow posted the validate and audit commit statuses as a hardcoded success to satisfy branch protection, then auto-merged its own PR. Since those statuses were unconditional, the loop could merge itself whether or not the real gates passed. The real gate scripts were already being run in the job, but their results were not used. This is the kind of unattended change without verification that docs/safety.md and docs/concepts.md warn about, so it felt worth fixing in the reference repo itself.

The change keeps the automation, since the statuses still need to be posted (a GITHUB_TOKEN push does not trigger the PR workflows), but ties them to reality:

  • The two gate steps now run with continue-on-error and an explicit id, and a follow-up step fails the run if either gate failed. So a red gate stops the PR from being opened or merged.
  • The status step now derives each state from the gate's real step outcome rather than a literal success, so a green status can only be posted when the gate actually passed.

The blast radius here was already small, only STATE.md and loop-run-log.md, but it lines the repo's own loop up with the verifier discipline the docs describe. If you would rather discuss this one separately, I am happy to pull it into its own PR or open an issue instead.

Testing

loop-audit:  10 passed   (includes the new triage-commit activity test)
loop-init:    6 passed   (includes 2 new pattern/tool validation tests)
loop-cost:    7 passed   (includes the new invalid-level test)

The compiled dist/ output is committed next to each source change, since the packages publish from committed dist/ and the test suites import from it, so the build artifacts stay in sync with the source.

Two related fixes to the audit engine, shipped together because they
live in the same recompiled source file.

1. Dead triage regex (bug)

The git-history scan in detectLoopActivity matched the literal token
" t riage " (spaces around it and a space inside the word) rather than
"triage". That alternation branch could never match a real commit
subject, so triage commits were silently not counted as loop activity.
It looked correct only because adjacent terms (loop, audit, state.md)
masked it. The fix changes the pattern to "triage" and adds a
regression test that initializes a temporary git repo with a single
triage commit and asserts that git evidence is detected. The test skips
gracefully when git is unavailable.

2. Magic scoring weights (maintainability)

The score contributions (18, 14, 9, and so on) and the level cutoffs
(38, 58, 78) were inlined as magic numbers, and the L3 threshold was
duplicated across computeScore and auditProject, which made it easy to
desync. They are now extracted into documented SCORE_WEIGHTS and
LEVEL_THRESHOLDS constants. This is a pure refactor: the values are
unchanged and the existing boundary tests pass without modification,
which proves behavior parity.

All 10 loop-audit tests pass.
Passing an invalid level such as "--level garbage" was cast straight to
ReadinessLevel and flowed into realisticMix, whose if/else chain has no
matching branch and falls through to the L3 return. The result was a
confident but wrong estimate for an invalid level.

The CLI now rejects an unknown level with a clear message and exit code
1. estimateCost also guards at the library boundary and throws on an
invalid level, so callers using it programmatically cannot hit the
silent fallthrough either. A test asserts that estimateCost rejects an
invalid level.

All 7 loop-cost tests pass.
parseArgs cast --pattern and --tool straight to their union types with
no validation, so a typo such as "--tool emacs" flowed downstream into
an undefined record lookup and a confusing failure.

Both values are now validated against the existing PATTERN_STARTERS and
TOOL_SUFFIX keys, which keeps a single source of truth with no
duplicated lists, and the CLI exits with code 1 and the list of valid
values. Two CLI tests assert the friendly errors.

All 6 loop-init tests pass.
The daily-triage loop posted the validate and audit commit statuses as
a hardcoded success state to satisfy branch protection, then auto-merged
its own PR. Because the green statuses were unconditional, the loop
could merge itself regardless of whether the real gates passed. That is
the unattended-without-verification failure mode that docs/safety.md and
docs/concepts.md (comprehension debt) explicitly warn against. The real
gate scripts (ci-validate-gates.sh and ci-audit-gates.sh) were already
run in the job, but their results were ignored.

This change keeps the automation, since statuses still need to be posted
because GITHUB_TOKEN-pushed commits do not trigger PR workflows, but it
makes the automation honest:

- The two gate steps now run with continue-on-error and an explicit step
  id. A follow-up step fails the run if either gate failed, so no PR is
  opened or merged on a red gate.
- The commit-status step derives each status state from the gate's real
  step outcome instead of a literal success value. A green status can
  now only be posted when the gate actually passed.

The blast radius was already small, limited to STATE.md and
loop-run-log.md, but this aligns the reference repo's own loop with the
verifier discipline it teaches.
@Nagendhra-web Nagendhra-web force-pushed the fix/loop-tooling-bugs branch from 62d2d41 to e9f1fcf Compare June 25, 2026 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant