Test cleanup, shared LLM-JSON parser, and self-play persistence fix#3
Conversation
Four test files existed twice — older copies at src/ root and newer, maintained copies under src/__tests__/. bun test ran both, so the same assertions executed redundantly. Remove the stale root copies and move shadow-eval.test.ts under __tests__ so every test lives in one place. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The same strip-think-tags / strip-code-fences / JSON.parse / extract-outer- block logic was reimplemented in four places (coach, judge, pairwise, stage-classifier). Extract it into src/llm-json.ts as extractJsonObject and route all four call sites through it; each caller keeps its own domain-specific normalization and last-resort regex fallback. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
composeSystemPrompt is a pure, deterministic function with no test coverage. Add tests for persona/framework sections, the few-shot toggle, conditional KB-context injection, the human-persona disclosure branch, and conditional rendering of the persona facts section. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
stage-classifier.ts had no test coverage. Add tests for parseClassifierOutput (code fences, prose prefixes, percentage-style confidence clamping, malformed input) and for classifyStage's regex fallback paths — driven by a stub ChatClient — covering llm-error, parse-error, unknown-stage, low-confidence, and the happy LLM path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A failed match insert in runSelfPlayMatch and runPairwiseMatch was only console.warn'd — the id was silently left null, so callers running evaluation loops had no clear signal their results were never recorded. Add an explicit persisted boolean to SelfPlayMatchResult and PairwiseMatchResult so consumers can detect the data loss. Non-throwing, additive change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (14)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
🎉 This PR is included in version 0.2.1 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Summary
Five self-contained improvements surfaced by a codebase audit — no filler.
src/__tests__/and drop four stale duplicate test files that were running redundantly.src/llm-json.ts; route coach, judge, pairwise, and stage-classifier through it.composeSystemPrompt(previously untested) — persona/framework sections, few-shot toggle, conditional KB context, disclosure branch, persona facts.stage-classifier(previously untested) —parseClassifierOutputandclassifyStage's regex fallback paths.persistedboolean onSelfPlayMatchResult/PairwiseMatchResult— a failed insert was previously onlyconsole.warn'd, masking silent data loss.Test plan
bun run typecheck— passesbun test— 115 pass / 0 failbun run check— biome cleanbun run build— bundle +.d.tsemit succeedNote: the
fix:commit will trigger a semantic-release patch publish on merge tomain.🤖 Generated with Claude Code