Skip to content

Test cleanup, shared LLM-JSON parser, and self-play persistence fix#3

Merged
chatman-media merged 5 commits into
mainfrom
claude/adoring-swanson-d5eabf
May 17, 2026
Merged

Test cleanup, shared LLM-JSON parser, and self-play persistence fix#3
chatman-media merged 5 commits into
mainfrom
claude/adoring-swanson-d5eabf

Conversation

@chatman-media

Copy link
Copy Markdown
Owner

Summary

Five self-contained improvements surfaced by a codebase audit — no filler.

  • test: consolidate test files into src/__tests__/ and drop four stale duplicate test files that were running redundantly.
  • refactor: extract the tolerant LLM-JSON parser (strip think-tags / code fences / parse / outer-block fallback) that was reimplemented in four places into src/llm-json.ts; route coach, judge, pairwise, and stage-classifier through it.
  • test: add coverage for composeSystemPrompt (previously untested) — persona/framework sections, few-shot toggle, conditional KB context, disclosure branch, persona facts.
  • test: add coverage for stage-classifier (previously untested) — parseClassifierOutput and classifyStage's regex fallback paths.
  • fix: surface self-play persistence failures via a persisted boolean on SelfPlayMatchResult / PairwiseMatchResult — a failed insert was previously only console.warn'd, masking silent data loss.

Test plan

  • bun run typecheck — passes
  • bun test — 115 pass / 0 fail
  • bun run check — biome clean
  • bun run build — bundle + .d.ts emit succeed

Note: the fix: commit will trigger a semantic-release patch publish on merge to main.

🤖 Generated with Claude Code

chatman-media and others added 5 commits May 18, 2026 02:38
Four test files existed twice — older copies at src/ root and newer,
maintained copies under src/__tests__/. bun test ran both, so the same
assertions executed redundantly. Remove the stale root copies and move
shadow-eval.test.ts under __tests__ so every test lives in one place.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The same strip-think-tags / strip-code-fences / JSON.parse / extract-outer-
block logic was reimplemented in four places (coach, judge, pairwise,
stage-classifier). Extract it into src/llm-json.ts as extractJsonObject and
route all four call sites through it; each caller keeps its own
domain-specific normalization and last-resort regex fallback.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
composeSystemPrompt is a pure, deterministic function with no test
coverage. Add tests for persona/framework sections, the few-shot toggle,
conditional KB-context injection, the human-persona disclosure branch, and
conditional rendering of the persona facts section.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
stage-classifier.ts had no test coverage. Add tests for
parseClassifierOutput (code fences, prose prefixes, percentage-style
confidence clamping, malformed input) and for classifyStage's regex
fallback paths — driven by a stub ChatClient — covering llm-error,
parse-error, unknown-stage, low-confidence, and the happy LLM path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A failed match insert in runSelfPlayMatch and runPairwiseMatch was only
console.warn'd — the id was silently left null, so callers running
evaluation loops had no clear signal their results were never recorded.

Add an explicit persisted boolean to SelfPlayMatchResult and
PairwiseMatchResult so consumers can detect the data loss. Non-throwing,
additive change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 17, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@chatman-media has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 16 minutes and 21 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cec97477-5502-4abe-b38e-d502f0425461

📥 Commits

Reviewing files that changed from the base of the PR and between 2ac4dcd and 9e1a460.

📒 Files selected for processing (14)
  • src/__tests__/llm-json.test.ts
  • src/__tests__/prompt.test.ts
  • src/__tests__/shadow-eval.test.ts
  • src/__tests__/stage-classifier.test.ts
  • src/ab-router.test.ts
  • src/coach.ts
  • src/elo.test.ts
  • src/llm-json.ts
  • src/self-play/judge.ts
  • src/self-play/orchestrator.ts
  • src/self-play/pairwise.ts
  • src/skill-recommendations.test.ts
  • src/stage-classifier.ts
  • src/stage-router.test.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/adoring-swanson-d5eabf

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatman-media chatman-media merged commit 2294854 into main May 17, 2026
3 checks passed
@github-actions

Copy link
Copy Markdown

🎉 This PR is included in version 0.2.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant