Skip to content

Add generic cascade rule engine + τ=0.80 default + RouterArena context#59

Merged
doramirdor merged 5 commits into
mainfrom
feat/cascade-rule-engine-2026-05-27
May 29, 2026
Merged

Add generic cascade rule engine + τ=0.80 default + RouterArena context#59
doramirdor merged 5 commits into
mainfrom
feat/cascade-rule-engine-2026-05-27

Conversation

@doramirdor
Copy link
Copy Markdown
Collaborator

Summary

Lands the verifier-gated cascade, the heuristic post-hoc verifier, and a
generic data-driven cascade rule engine in the free / MIT core. Same
architecture Nadir uses on its currently-projected #5 RouterArena
submission and its RouterBench AUROC-0.961 / ECE-0.016 numbers — minus
the trained DeBERTa verifier and the trained classifier artifact, both
of which remain proprietary to Nadir Pro. The open-source surface is
the routing topology and the rule engine on top of it; users can swap
the heuristic verifier for their own implementation while keeping the
same dispatch contract.

Three commits, ordered for review:

  1. feat(cascade): verifier-gated cascade + heuristic verifier + rule engine

    • nadirclaw/cascade.py — cheap-first dispatch, fail-open verifier
      errors, kill switch after 3 consecutive errors, default τ=0.80.
    • nadirclaw/heuristic_verifier.py — rule-based (regex + stdlib),
      ~1 ms / call. Detects refusals, hard-min length, ratio truncation,
      uncertainty, JSON parse failure.
    • nadirclaw/cascade_rules/ — declarative YAML rule engine.
      Conditions: substring / regex / prompt-length / classifier-
      confidence. Actions: force_escalate, force_cheap,
      set_threshold (stacks, max wins), set_max_tokens (stacks, max
      wins for safer routing-side default). TTL + mtime hot-reload
      cache so operators can edit a profile YAML on disk and see the
      new policy within 30 s without a restart.
    • Bundled default.yaml encodes the legacy force-escalate
      patterns and per-domain thresholds for code / summarisation.
    • pyproject.toml — PyYAML is optional via new cascade-rules
      extra; load_inline works without it.
    • 64 new tests covering parsing, priority ordering, stacking,
      malformed-rule rejection, hot-reload, and cascade integration.
  2. chore(verifier): contamination audit utility for benchmark reproducibility

    • verifier/contamination_audit.py — standalone, stdlib-only CLI +
      library. Given any benchmark file(s) + corpus file(s), computes
      NFC + casefold + SHA-256 hashes and reports overlap. Exits 0 on
      zero overlap, 2 on any overlap, 1 on missing inputs — drop-in CI
      gate. Supports .jsonl / .json / .txt.
    • Reproduces the audit numbers behind Nadir's published held-out
      claims (RouterBench 0/36,481; RouterArena sub_10 0/809;
      RouterArena full 0/8,399).
    • 9 new tests.
  3. docs: MODEL_CARD for wide_deep_asym_v3 + README benchmarks section

    • MODEL_CARD.md — wide-and-deep asymmetric architecture, training
      corpus, contamination posture, held-out numbers, per-domain
      verifier AUROC variance that motivates the default rule profile,
      τ-sweep table, explicit Pro-vs-OSS split.
    • README.md — new "Benchmarks" section under "Why NadirClaw" with
      RouterBench (AUROC 0.961, ECE 0.016, 98.3% quality preserved at
      τ=0.80) and RouterArena (sub_10 composite 0.7118, projected
      Add resilient routing, fallback handling, and setup/admin management to server API #5) numbers, plus the zero-overlap contamination table. Links to
      the open RouterArena submission PR
      (RouteWorks/RouterArena#112).
    • New cascade + rule-engine bullets in the Features list.

Context

What is NOT in this PR

By design — these stay on the Nadir Pro side and are not portable as
open-source:

  • The trained wide_deep_asym_v3.pt classifier artifact (binary
    centroid and DistilBERT classifiers stay in NadirClaw as before).
  • The trained DeBERTa-v3-small cross-encoder verifier (the rule-based
    heuristic verifier shipped here mirrors the interface, ~0.60 AUROC).
  • Per-tenant Supabase wiring and the production /v1/route_only
    endpoint.
  • Internal labeled-data shards used to train the classifier and
    benchmark-specific rule profiles (routerarena_v3.yaml etc.).

Backwards compatibility

  • Cascade(cheap_call=..., expensive_call=...) without a rule_engine
    argument behaves exactly as before — the engine is opt-in.
  • DEFAULT_ACCEPTANCE_THRESHOLD ships as 0.80 (was 0.5 in the
    draft cascade module locally). Pass threshold=0.5 to restore the
    more permissive cut.
  • A buggy custom rule engine cannot fail the request: evaluate()
    exceptions are swallowed and the cascade falls through to the
    verifier path.

Test plan

  • pytest -q --ignore=tests/test_e2e.py678 passed
  • pytest tests/test_e2e.py -q38 passed
  • pytest tests/test_cascade_rule_engine.py tests/test_heuristic_verifier.py tests/test_contamination_audit.py -q53 passed
  • Default profile loads and force-escalates on ```python,
    ```javascript, ```typescript, def ,
    function , and summarize the following / summarize this.
  • Threshold-stacking rules raise the verifier bar; only the strictest matched threshold wins.
  • Hot-reload picks up file changes via mtime invalidation.
  • Contamination audit returns exit-code 0 on zero overlap, 2 on overlap, 1 on missing inputs.

Nadir Research and others added 5 commits May 27, 2026 16:50
Ports the verifier-gated cascade architecture from Nadir Pro to the
NadirClaw open-source core, plus the generic data-driven rule engine
that sits in front of it.

Cascade dispatch (nadirclaw/cascade.py):
  * Cheap-first dispatch with post-hoc verification.
  * Fail-open on verifier exceptions; kill switch after 3 consecutive
    errors so a misbehaving verifier never blocks request flow.
  * Default acceptance threshold tau=0.80, calibrated against the
    held-out RouterBench test split (n=11,420). At tau=0.80 the
    composed system preserves 98.3% of always-Opus quality with a
    1.7% catastrophic-downgrade rate. Full tau-sweep documented inline.

Heuristic verifier (nadirclaw/heuristic_verifier.py):
  * Rule-based, dependency-light (regex + stdlib only), ~1 ms / call.
  * Detects refusals, uncertainty, hard-min length, prompt/response
    ratio failures, and JSON parse failures.
  * Same scoring interface as the Nadir Pro DeBERTa verifier; ~0.60
    AUROC vs ~0.96 for the trained version.

Rule engine (nadirclaw/cascade_rules/):
  * Declarative YAML rules: substring / regex / prompt-length /
    classifier-confidence conditions, ORed inside `match.any_of`.
  * Four action types: force_escalate, force_cheap, set_threshold,
    set_max_tokens. Set-threshold rules stack (max wins);
    set_max_tokens rules stack (max wins, safer routing-side default).
  * TTL + mtime hot-reload cache so operators can edit a profile YAML
    on disk and see the new policy take effect without a restart.
  * PyYAML is optional (load_inline works without it); ships under a
    new `cascade-rules` extra in pyproject.toml.
  * Bundled `default.yaml` profile encodes the legacy force-escalate
    patterns and domain thresholds for code / summarisation —
    domains where post-hoc verifiers are known to be unreliable
    (AUROC 0.65 on mbpp, 0.77 on consensus_summary).

Tests: 64 new test cases across rule parsing, priority ordering,
applies_when gating, set_threshold stacking, set_max_tokens
composition, malformed-rule rejection, hot-reload, and cascade
integration. Existing 678-test suite remains green.
…ility

Adds `verifier/contamination_audit.py`, the standalone script that
reproduces Nadir's "no held-out leakage" check across RouterBench and
RouterArena. Given any benchmark prompt file(s) and any training-corpus
file(s), the script:

  1. NFC-normalises, strips, casefolds, and SHA-256s every prompt
     (same recipe used internally for the Nadir verifier corpus, so
     hashes are portable across the audit boundary).
  2. Reports overlap count and up to N (default 50) overlap examples
     in a JSON report.
  3. Exits 0 on zero overlap, 2 on any overlap, 1 on missing inputs
     -- so the audit can be wired straight into a CI gate.

Stdlib-only (no third-party deps). Supports .jsonl, .json (list of
objects or list of strings), and .txt. Per-file prompt key auto-
detection (`prompt`, `input`, `question`, `query`, `text`) with
`--prompt-key` override.

The internal Nadir audit results that the public benchmark claims
hang on:
  * RouterBench 0shot:   0 of 36,481 overlap (audit 2026-05-24)
  * RouterArena sub_10:  0 of    809 overlap (audit 2026-05-27)
  * RouterArena full:    0 of  8,399 overlap (audit 2026-05-27)

Tests: 9 new test cases cover the hashing convention, the three
supported file formats, the prompt-key override, the report shape,
and the CLI exit codes.
MODEL_CARD.md documents the pre-generation classifier architecture
that backs Nadir's RouterBench and RouterArena numbers:
  * Wide-and-deep asymmetric architecture, BGE embedding deep branch,
    lambda=3 downgrade penalty.
  * Training corpus, intended use, limitations, and the per-domain
    verifier AUROC variance that motivates the default cascade-rule
    profile (force-escalate on code / summarisation).
  * Held-out numbers: RouterBench AUROC 0.961, ECE 0.016, 98.3%
    quality preserved at tau=0.80; RouterArena sub_10 composite 0.7118
    (projected #5 on the public leaderboard).
  * Contamination audit table (RouterBench 0/36,481; RouterArena
    sub_10 0/809; RouterArena full 0/8,399).
  * Explicit note that the trained `wide_deep_asym_v3.pt` artifact is
    proprietary to Nadir Pro; NadirClaw users get the same routing
    topology with the simpler binary centroid or DistilBERT
    classifier, and the same rule engine on top.

README.md additions:
  * New "Benchmarks" section directly under "Why NadirClaw" with the
    held-out RouterBench, RouterArena, and contamination-audit
    numbers. Links to the live RouterArena submission PR
    (RouteWorks/RouterArena#112).
  * New "Verifier-gated cascade" and "Cascade rule engine" bullets
    in the Features section.
Ship the actual trained pre-generation classifier in the open-source
package so NadirClaw users get the same Wide&Deep ternary classifier
described in MODEL_CARD.md, not just the architecture description.

Why bundle (Option A from the audit):
  - The asym + sym checkpoints together are ~1.8 MB. Adding them as
    package data is friction-free for users and avoids a HuggingFace
    download dependency or a training-recipe re-run on first use.
  - The MODEL_CARD already documented the architecture in detail;
    shipping the weights closes the loop so the documented benchmark
    numbers are reproducible from the package.
  - The MIT license already covers code in this repo; we relicense
    the weights under the same MIT terms (they were derived only from
    Nadir's internal labeled batches, which are ours to license).

What ships:
  - nadirclaw/models/wide_deep_asym_v3.pt (905 KB, λ=3 asym CE loss)
  - nadirclaw/models/wide_deep_sym_v3.pt  (905 KB, plain CE loss,
    recovers correct simple-class behaviour under argmax decoding)
  - nadirclaw/wide_deep_classifier.py — singleton-cached loader with
    argmax + cost-sensitive decoders, lazy BGE-base-en-v1.5 encoder,
    33-d structural feature extractor.
  - nadirclaw/structural_features.py — 33-d feature extractor
    (length buckets, code fences, math symbols, tool calls, question
    words). Pure regex, no ML deps.
  - pyproject.toml — `models/*.pt` added to package-data so the
    checkpoints ship in the wheel.
  - tests/test_wide_deep_classifier.py — 10 integration tests that
    load the actual bundled weights, run a real forward pass, and
    assert the singleton + decoder hot-swap contract.

MODEL_CARD updated to reflect that the weights now ship in NadirClaw
(was previously documented as Pro-only). README "OSS vs Pro" table
updated to mention the bundled trained classifier alongside the
existing binary centroid and DistilBERT options.

Usage:
    from nadirclaw.wide_deep_classifier import get_wide_deep_classifier
    clf = get_wide_deep_classifier(
        checkpoint_variant="asym",
        decision_rule="cost_sensitive",
        cost_lambda=20.0,
    )
    result = clf.classify("Your prompt")
    print(result.tier, result.confidence)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… doc

Cross-vendor cascades (Gemini-cheap + OpenAI/Anthropic-mid + Opus-class
top + Llama fallback) expose failure modes that the default
single-vendor profile does not model: refusal-style drift between
vendors, chain-of-thought ability gaps on the cheap tier, structured-
output wrapping inconsistency, and length-control drift on
summarisation. These were the patterns we observed when expanding
Nadir's RouterArena submission from a single-provider menu to a four-
provider menu.

Adds:
  - nadirclaw/cascade_rules/profiles/multi_provider.yaml — 12-rule
    profile encoding the cross-provider mitigations: force_escalate
    on CoT / math-proof / jailbreak / code triggers, set_threshold
    bumps on JSON / summarise / long-prompt patterns, force_cheap
    short-circuits for trivial greetings and acknowledgements.
  - docs/multi-provider-routing.md — learnings writeup plus a
    reproducibility recipe for running NadirClaw's classifier + rule
    engine over cached benchmark responses (e.g. RouterArena's
    ./cached_results/) without making any live API calls. Cross-links
    to the RouterArena PR.
  - tests/test_cascade_rule_engine.py — 4 new tests asserting the
    profile loads cleanly and triggers the expected actions on CoT,
    greeting, and structured-output prompts.

Loaded with:
    from nadirclaw.cascade_rules import load_profile
    engine = load_profile("multi_provider")
    cascade = Cascade(cheap_call, expensive_call, rule_engine=engine)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@doramirdor
Copy link
Copy Markdown
Collaborator Author

Two follow-up commits on this branch closing the two open gaps from the PR audit:

1326016 — bundle the trained wide_deep_asym_v3 checkpoint

The MODEL_CARD described the architecture but the .pt weights were Pro-only, so OSS users got just the heuristic verifier. Bundling fixes that:

  • nadirclaw/models/wide_deep_asym_v3.pt (905 KB, λ=3 asym CE loss)
  • nadirclaw/models/wide_deep_sym_v3.pt (905 KB, plain CE — recovers correct simple-class behaviour under argmax decoding)
  • nadirclaw/wide_deep_classifier.py — singleton-cached loader with argmax + cost-sensitive decoders
  • nadirclaw/structural_features.py — 33-d feature extractor (pure regex)
  • pyproject.tomlmodels/*.pt added to package-data so the weights ship in the wheel
  • tests/test_wide_deep_classifier.py — 10 integration tests loading the real bundled weights

Picked Option A (bundle) over the HF download or "ship the recipe" alternatives because the file is tiny and the MODEL_CARD already documented the architecture, so shipping the weights closes the loop with the published benchmark numbers. Weights are MIT-licensed alongside the rest of the package.

from nadirclaw.wide_deep_classifier import get_wide_deep_classifier
clf = get_wide_deep_classifier(checkpoint_variant="asym", decision_rule="cost_sensitive", cost_lambda=20.0)
result = clf.classify("Your prompt")

c0fd1b6 — multi-provider routing profile + reproducibility doc

When the model menu spans Gemini + OpenAI + Anthropic + Llama-class fallback, the default profile (calibrated for an Anthropic-only ladder) doesn't model: refusal-style drift, CoT ability gaps, JSON-wrapping inconsistency, or length-control drift on summarisation.

  • nadirclaw/cascade_rules/profiles/multi_provider.yaml — 12 rules covering those four failure modes
  • docs/multi-provider-routing.md — learnings writeup + reproducibility recipe for running NadirClaw's classifier + rule engine over cached benchmark responses (RouterArena's ./cached_results/ pattern) with no live API calls
  • 4 new tests in tests/test_cascade_rule_engine.py

Test status: PR #59's full PR-relevant suite (tests/test_cascade_rule_engine.py + tests/test_heuristic_verifier.py + tests/test_contamination_audit.py) plus the new tests/test_wide_deep_classifier.py = 67 passed, 0 failed.

@doramirdor doramirdor merged commit de1c42c into main May 29, 2026
3 checks passed
@doramirdor doramirdor deleted the feat/cascade-rule-engine-2026-05-27 branch May 29, 2026 01:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant