Skip to content

v0.19.2: Fix TrainedVerifier input format#63

Merged
doramirdor merged 1 commit into
mainfrom
fix/verifier-input-format-v0.19.2
May 29, 2026
Merged

v0.19.2: Fix TrainedVerifier input format#63
doramirdor merged 1 commit into
mainfrom
fix/verifier-input-format-v0.19.2

Conversation

@doramirdor
Copy link
Copy Markdown
Collaborator

Summary

NadirClaw's TrainedVerifier.score() was tokenizing with the bare cheap answer as text_pair. The released cross-encoder (nadirclaw/cascade-verifier-v1) was trained on a structured format:

text_pair = f"CHEAP:\n{cheap_answer}\n\nEXPENSIVE:\n{reference_answer or ''}"

This matches the Pro production backend at getnadir.dev/backend/app/services/verifier_model.py:195 and the HF model card.

Without the wrapper, the verifier's scores drift against the calibrated tau=0.80 acceptance threshold, which produced the RouterArena PR #112 numbers.

Changes

  • nadirclaw/trained_verifier.py — wrap tokenizer input in CHEAP:/EXPENSIVE: format; fold reference_answer into the EXPENSIVE: block (empty when None); update docstring (no longer "ignored").
  • tests/test_trained_verifier.py — add test_trained_verifier_wraps_input_in_production_format covering reference provided, None, and whitespace-only cases via a mock tokenizer.
  • nadirclaw/__init__.py — bump __version__ to 0.19.2.

Calibration impact

Mode Behavior
Before v0.19.2 text_pair = cheap_answer (bare) — drifted scores vs tau=0.80
After v0.19.2 text_pair = f"CHEAP:\n{cheap}\n\nEXPENSIVE:\n{ref or ''}" — matches production

References

Test plan

  • pytest tests/test_trained_verifier.py -v (9 passed, 1 slow-gated skip)
  • Full suite: pytest tests/ -v (773 passed, 1 skipped)
  • Mock-tokenizer test asserts exact text_pair wrapping for three cases (with ref, None, whitespace)

NadirClaw's TrainedVerifier was passing the cheap answer as the bare
text_pair to the tokenizer. The model was trained on a structured format
with CHEAP:/EXPENSIVE: markers, matching what the Pro production backend
uses. Without that wrapper, scores are miscalibrated against the
production tau=0.80 threshold.

This patch wraps the input in the production format:

  text_pair = f"CHEAP:\n{cheap}\n\nEXPENSIVE:\n{reference or ''}"

reference_answer is now used when provided (was previously documented as
ignored). Behavior with reference_answer=None matches production: empty
string substitution.

Aligns NadirClaw with:
- https://huggingface.co/nadirclaw/cascade-verifier-v1 (model card)
- getnadir.dev/backend/app/services/verifier_model.py (production)

Repo: https://github.com/NadirRouter/NadirClaw
Service: https://getnadir.com
@doramirdor doramirdor force-pushed the fix/verifier-input-format-v0.19.2 branch from 216225c to bc46c72 Compare May 29, 2026 13:09
@doramirdor doramirdor merged commit 3be0f72 into main May 29, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant