fix(benchmark): drop invalid focus token and phantom model IDs in corpus.yml by jeff-atriumn · Pull Request #162 · atriumn/noxaudit

jeff-atriumn · 2026-05-30T16:25:20Z

Summary

Removes does_it_work from matrix.focus — it is not a key in FOCUS_AREAS and causes ValueError: Unknown focus area at runtime
Updates the comment above focus: to stop describing does_it_work
Removes gemini-2.5-flash-lite and gemini-3.1-pro-preview — neither has a MODEL_PRICING entry
Corrects gemini-3-flash → gemini-3-flash-preview (actual pricing key)
Corrects gpt-5.2 → gpt-5.4 (actual pricing key)
Adds TestCorpusYmlFocusTokens and TestCorpusYmlModelIds in tests/test_benchmark.py to guard against regressions

Test plan

pytest tests/test_benchmark.py — all 24 tests pass, including the 3 new corpus validation tests
Full pytest — 444/444 pass

Closes #159

…pus.yml - Remove `does_it_work` from matrix.focus (not a registered FOCUS_AREAS key) - Update trailing comment above `focus:` to remove reference to does_it_work - Remove `gemini-2.5-flash-lite` and `gemini-3.1-pro-preview` (no MODEL_PRICING entries) - Correct `gemini-3-flash` → `gemini-3-flash-preview` (actual pricing key) - Correct `gpt-5.2` → `gpt-5.4` (actual pricing key) - Add TestCorpusYmlFocusTokens and TestCorpusYmlModelIds to test_benchmark.py to guard against regressions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jeff-atriumn force-pushed the worktree-cheerful-mapping-curry branch from d7c496b to 60f37c2 Compare May 30, 2026 17:01

jeff-atriumn merged commit ec85982 into main May 30, 2026
5 checks passed

jeff-atriumn mentioned this pull request May 30, 2026

chore(main): release 1.2.4 #152

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(benchmark): drop invalid focus token and phantom model IDs in corpus.yml#162

fix(benchmark): drop invalid focus token and phantom model IDs in corpus.yml#162
jeff-atriumn merged 1 commit into
mainfrom
worktree-cheerful-mapping-curry

jeff-atriumn commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jeff-atriumn commented May 30, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant