Skip to content

feat: joy-check 100% threshold + deterministic routing scripts + CI#633

Closed
notque wants to merge 4 commits into
mainfrom
improve/joy-check-100-accuracy
Closed

feat: joy-check 100% threshold + deterministic routing scripts + CI#633
notque wants to merge 4 commits into
mainfrom
improve/joy-check-100-accuracy

Conversation

@notque
Copy link
Copy Markdown
Owner

@notque notque commented May 10, 2026

Summary

  • Joy-check 100% accuracy: raised pass threshold from ≥60 to 100 with zero tolerance for negative patterns. Joy-check passes joy-check at 100%.
  • Deterministic routing scripts: do-classify.py, do-enhance.py, do-build-prompt.py — move lookup tables and templates out of /do SKILL.md into testable Python (56% token reduction in do_b variant).
  • Routing regression tests: 20 golden routing cases + 17 enhancement stacking cases = 68 tests covering classification, model selection, thinking directives, and multi-signal combinations.
  • Joy-check CI: 179-test fleet scan across all agents and skills in GitHub Actions, extending existing validate_positive_instruction_docs.py.

Changes

File What
joy-check/SKILL.md Threshold 60→100, reframed one instruction
joy-check/references/instruction-rubric.md Pass criteria ≥60→100
joy-check/references/writing-rubric.md Pass criteria ≥60→100
scripts/do-classify.py Deterministic request classification
scripts/do-enhance.py Deterministic enhancement/model selection
scripts/do-build-prompt.py Prompt templates (zero context cost)
scripts/tests/test_do_routing.py 20 golden cases, 40 tests
scripts/tests/test_do_enhancement_stacking.py 17 golden cases, 28 tests
scripts/tests/test_joy_check_instruction_mode.py 19 fixtures + 179 fleet scan
scripts/validate_positive_instruction_docs.py Added 2 missing patterns, fixed Avoid regex
.github/workflows/test.yml Added joy-check CI job
skills/meta/do_b/SKILL.md A/B test variant of /do with script-backed tables

Test plan

  • 247 tests pass (routing + enhancement + joy-check)
  • ruff check + format clean
  • Joy-check SKILL.md passes joy-check at 100%
  • All instruction-rubric patterns covered by golden fixtures

notque added 4 commits May 10, 2026 16:54
- SKILL.md: pass criteria changed from >= 60 to == 100 (both modes); --strict flag threshold updated to 100; subtle-pattern instruction reframed positively
- instruction-rubric.md: pass criteria updated to score == 100 AND zero primary negative patterns
- writing-rubric.md: pass criteria updated to score == 100 AND zero GRIEVANCE paragraphs

Joy-check instruction-mode analysis of SKILL.md: 100 — zero primary negative patterns. All negative-word occurrences are inside code blocks (contextual exceptions).
Extends validate_positive_instruction_docs.py with the two missing
primary patterns (NEVER caps, Don't instruction-start) from the
instruction rubric, tightens the Avoid heading regex to eliminate a
false positive on technical phrases like "to Avoid N+1", and adds
voice-corpus files to the allowlist as a documented contextual exception.

New test file covers all 7 primary patterns as golden fixtures, 5
contextual exceptions that must pass (fenced blocks, blockquotes,
subordinate lowercase never, clean files, positive rewrites), and a
parametrized fleet scan across 44 agents and 116 SKILL.md files: 2 voice
corpus files skipped (allowlisted), 177 other components pass.

Adds a `joy-check` CI job to test.yml that runs on every push/PR.
- do-classify.py: request classification (complexity, creation, interview, parallel)
- do-enhance.py: enhancement stacking (anti-rat, thinking, model selection)
- do-build-prompt.py: prompt templates (haiku, banner, agent, task-spec)
- do_b SKILL.md: /do with script-backed tables (56% token reduction)
- test_do_routing.py: 20 golden routing cases (40 tests)
- test_do_enhancement_stacking.py: 17 golden enhancement cases (28 tests)

All 247 tests pass across routing, enhancement, and joy-check suites.
Parenthesized assert messages collapsed to single-line f-strings per
ruff format rules (line length fits within 120 char limit).
@notque notque enabled auto-merge May 10, 2026 17:08
@notque notque disabled auto-merge May 10, 2026 17:12
@notque
Copy link
Copy Markdown
Owner Author

notque commented May 10, 2026

Closing: mixed joy-check + do_b experimental work. Will re-submit joy-check only as a clean PR after resolving private→public skill migration.

@notque notque closed this May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant