feat(ENG-335): multi-head ONNX + temperature scaling + API contract fixes (v0.7) by hiskudin · Pull Request #63 · StackOneHQ/defender

hiskudin · 2026-05-13T15:37:08Z

Summary

Adds multi-head ONNX classifier support ([batch, 2] main+aux output) with a documented decision rule: block iff main >= mainThreshold AND aux < auxThreshold.
Adds post-hoc temperature scaling — score = sigmoid(logit / T) — so bundled models can ship pre-calibrated and consumers get well-behaved probabilities without retuning their own thresholds.
Swaps the bundled default from minilm-full-aug to minilm-multihead-v5. The new model ships with a classifier_config.json calibration block (temperatureT: 2.41, highRiskThreshold: 0.64) that auto-loads at construction; library hardcoded defaults < model defaults < caller config.
Fixes 3 latent API contract bugs surfaced by 0.7 work:
1. highRiskThreshold set on Tier2ClassifierConfig (caller-provided or model-auto-loaded) wasn't propagated to the gate's threshold copy, so it silently fell back to the framework default.
2. The DENSITY_SUB_THRESHOLD constant in the density adjustment was hardcoded against raw scores; under temperature scaling it would never trigger. Now rescaled via sigmoid(log(3)/T).
3. tier2Score previously reported the raw max-chunk score even when density adjustment or aux-veto changed the actual decision. It now reports the effective score that determined riskLevel/allowed; the raw pre-adjustment value is preserved on the new tier2RawScore field for forensics.
Establishes the operator invariant: tier2Score >= highRiskThreshold ⇔ result.allowed === false. Under aux veto, tier2Score = 0 so the public triple (tier2Score, riskLevel, allowed) tells one coherent story; the un-vetoed main is on tier2RawScore.

AgentShield results

Target baseline (prior v5 single-head): Final 86.7.

Expected on this PR with calibrated v5 multi-head bundled by default:

Final ~86.7   (matching v5 calibration-findings baseline)
  PI · Jail · DE · TA · OR · MA · Prov

Re-run pending against feat/tier2-v0.7 HEAD; numbers will be posted as a comment before merge.

What's in the API

Tier2ClassifierConfig.multihead?: { mainThreshold, auxThreshold } — opt-in multi-head decision rule. Both fields are required (no library default) because the right operating point is model- and traffic-specific. For the bundled model, FP-benchmark validation gives { 0.5, 0.64 }.
Tier2ClassifierConfig.temperatureT?: number — advanced; override only when shipping a custom ONNX model. The bundled model auto-loads its fitted T.
New tier2RawScore field on DefenseResult — forensic value when density or aux-veto rewrites the effective score.

Notes

Default behavior is unchanged for single-head consumers: passing only onnxModelPath (or no config at all, using the bundled model) gets strict main-only blocking with validated defaults.
Breaking-ish: tier2Score semantics changed from "max raw chunk score" to "effective score backing the decision." Direct numeric comparisons that bypass riskLevel/allowed should re-read against tier2RawScore.
Bundled artefacts: v5 ONNX (~22 MB) lands at src/classifiers/models/minilm-multihead-v5/. The legacy minilm-full-aug directory is replaced with a .gitkeep placeholder so the old path errors loudly rather than loading silently.

Test plan

npx vitest run — 292 tests pass (22 in tier2-multihead.spec.ts).
npx biome check src/ — clean.
Build clean (tsdown); npm pack dry-run at 18.5 MB.
AgentShield re-run on 0803063: Final 86.7 (matches v5 baseline). See comment for category breakdown.
After merge: release-please bumps to 0.7.0 on feat: commits.
After publish: bump @stackone/defender to ^0.7.0 in the plugin repo and remove the dual-model shadow eval.

🤖 Generated with Claude Code

…ixes Adds opt-in support for dual-head ([batch, 2]) ONNX classifiers, post-hoc temperature scaling for calibrated probability semantics, and the multi-head decision rule (block iff main >= mainThreshold AND aux < auxThreshold). All behind opt-in config — single-head consumption stays the back-compat default. API additions: - tier2Config.multihead?: { mainThreshold, auxThreshold } - tier2Config.temperatureT?: number (raw sigmoid when 1.0) - OnnxClassifier.classifyPair / classifyBatchPair (main + aux) - Tier2Classifier.classifyChunksBatchPair / isMultihead / getMultiheadConfig - Tier2Classifier auto-loads calibration defaults from classifier_config.json - DefenseResult.tier2AuxScore, tier2MultiheadBlocked - DefenseResult.tier2RawScore (debug; see Bug 3 below) - getDefaultModelPath exported Three latent API contract bugs uncovered during calibration are fixed here: Bug 1 — tier2Config.highRiskThreshold overrides never propagated to the block gate. Visible only when calibrated thresholds land between the override and the un-propagated default (0.8). Latent since multi-head support was added. Fix: PromptDefense constructor now syncs threshold overrides into this.config.tier2.* alongside the Tier2Classifier copy. Bug 2 — DENSITY_SUB_THRESHOLD was hardcoded in raw-sigmoid space. Under temperature scaling, scores compress toward 0.5 and the literal 0.75 cutoff stops counting "high" events, causing density damping to silently under-fire. Fix: rescale in logit space — sigmoid(log(3) / T). T=1 is a no-op; T=2.41 yields ~0.612. Bug 3 — tier2Score returned the raw max-chunk main, but the block gate used tier2EffectiveScore (post-density). Operators comparing tier2Score >= highRiskThreshold got a different answer than result.allowed === false. Fix: tier2Score now reports the effective score that drove the decision; the pre-density max-chunk main is surfaced as tier2RawScore for forensics. Under multi-head aux veto, tier2Score is undefined (no block-driving score) — operators should check tier2MultiheadBlocked when they need the rule's verdict explicitly. 229 tests pass. Default model path still points at minilm-full-aug — the v5 multi-head model with calibrated defaults lands in the next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…metadata Replaces the legacy minilm-full-aug ONNX model with minilm-multihead-v5, a dual-head MiniLM-L6 fine-tuned with code/docs/git aux supervision. Single-head consumption by default — no aux behavior change unless callers opt into tier2Config.multihead. Calibrated probability semantics by default via the model's classifier_config.json:calibration block. Calibration metadata (model self-describes): temperatureT: 2.41 highRiskThreshold: 0.64 (math-equivalent to raw 0.8 at T=2.41) ece: 0.09 fitted_on: labeled plugin events 2026-05-13 Tier2Classifier auto-loads these defaults at construction; user-provided tier2Config still wins. Models without a calibration block (custom paths pointing at non-v5 models) fall back to library defaults (T=1, threshold=0.8). Migration: - Callers using the default config now receive calibrated probabilities. tier2Score values for the same content will shift toward 0.5 (less saturated). Re-check any hardcoded threshold comparisons. - Callers explicitly setting tier2Config.highRiskThreshold see no semantic change other than Bug 1 (previous commit) finally honoring overrides. - Callers explicitly setting onnxModelPath: ".../minilm-full-aug" break — that directory is no longer shipped. v5 ships as the only bundled model. Build / packaging: - scripts/copy-models.cjs replaces an inline package.json one-liner. MODEL_DIRS lists the bundled variants; add new models here. - npm pack size: 18.5 MB (was projected 90+ MB with all session variants). - dist size: 23 MB (was 100 MB with all variants). Pruning: - Removed legacy minilm-full-aug binary. - Removed v3, v4c, v6, v31 dev variants — kept in classifier-eval workspace and on Modal volume for benchmarking; not in the npm tarball. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ead thresholds Three review-feedback fixes on the v0.7 branch: 1. tier2Score under aux veto. Previously set to undefined when the multi-head rule rescued content (rationale: "no block-driving score"). That preserved the strict invariant `tier2Score >= highRiskThreshold ⇔ allowed === false` but produced incoherent operator telemetry — high `tier2RawScore` with `tier2Score: undefined` is hard to reason about on dashboards. New behavior: under aux veto, `tier2Score = 0` so the operator triple (`tier2Score`, `riskLevel`, `allowed`) tells one coherent story — zero / low / true. The model's actual main signal is preserved on `tier2RawScore`, and `tier2MultiheadBlocked: false` + `tier2AuxScore` give rule-level context for anyone debugging the decision. Combined with the riskLevel-from-tier2EffectiveScore derivation, the operator invariant `tier2Score >= highRiskThreshold ⇔ allowed === false` holds in single-head and multi-head-rule-fired modes; multi-head aux-veto is the third branch and now reads consistently as "zero contribution". 2. MultiheadConfig JSDoc. The field-level docstrings claimed `Default: 0.5` and `Default: 0.3` — misleading because both fields are required (no library default) and (0.5, 0.3) is the operating point that produced our documented AS regression. Rewrote the interface docblock to point at the FP-benchmark-validated `(0.5, 0.8)` raw / `(0.5, 0.64)` calibrated default, with a reference to evals/RESULTS.md for the threshold sweep. 3. tier2Score JSDoc on DefenseResult. Rewritten to enumerate the three modes (single-head, multi-head rule fired, multi-head aux veto) with the exact value semantics for each. Also: trimmed over-commenting in specs/tier2-multihead.spec.ts (~95 lines removed). Kept the non-obvious context (threshold-arithmetic notes, the "2/6 ticket variants" operational fact); removed the line-by-line narrative. 290 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…null assertion Three small follow-ups to the review feedback: 1. readCalibrationDefaults now distinguishes ENOENT (silent — legacy models without classifier_config.json) from other read failures and JSON parse errors (warn). A typo in a shipped calibration block now surfaces at construction time instead of silently falling back to library defaults. 2. OnnxClassifier throws on non-positive or non-finite temperatureT. T must be in (0, ∞); zero, negative, NaN, and Infinity now produce a clear error rather than being silently coerced to 1. Calibration with invalid T is a programming error, not graceful-degradation territory. 3. Replaced `this.tier2Classifier!` non-null assertion at prompt-defense.ts with a captured local inside the existing narrowed block. Lint is now warning-free; biome check passes cleanly. +1 test for temperatureT validation. 291 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The bundled v5 model auto-loads its fitted T from classifier_config.json, so most callers should never set this field. Trim the JSDoc on Tier2ClassifierConfig.temperatureT and remove temperature references from MultiheadConfig threshold docs so the public surface reads as a single, simple knob rather than asking consumers to reason about model internals. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…loaded thresholds apply Tier2Classifier merges hardcoded defaults < model classifier_config.json < caller `tier2Config`. The previous Bug 1 fix only synced the caller-override path; thresholds auto-loaded from a model's calibration block (the new v0.7 default) reached Tier2Classifier's internal copy but not the gate at `this.config.tier2.highRiskThreshold`. Result: bundled v5 ships `highRiskThreshold: 0.64`, Tier2Classifier sees 0.64, gate stays at the library default 0.8. A calibrated score of ~0.75 on an attack lands `riskLevel: "high"` with `allowed: true` — exactly the incoherent triple Bug 1 was supposed to eliminate. Discovered when the AgentShield score dropped from 86.7 to 80.9 on the v0.7 candidate: 36 attacks flipped from block to allow on the model-auto-load path. Fix: drop the pre-construction sync (Tier2Classifier already applies caller overrides via its 3-tier merge) and read back from `tier2Classifier.getConfig()` after construction. The readback is authoritative regardless of whether the threshold came from library defaults, model auto-load, or caller override — a single source of truth for the gate. Regression test: model-level calibration auto-load must propagate to the gate, demonstrated against the bundled v5 model with no caller config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR upgrades Tier 2’s ONNX pipeline to support multi-head models (main+aux), adds temperature scaling for calibrated probabilities, and updates the public API/telemetry to reflect “effective” (decision-driving) scores while preserving raw scores for forensics.

Changes:

Added multi-head inference support ([batch, 2]) and a configurable decision rule using main/aux thresholds.
Introduced temperature scaling (sigmoid(logit / T)) and auto-loading of model calibration defaults from classifier_config.json.
Adjusted PromptDefense result fields/semantics (tier2Score as effective score; added tier2RawScore, tier2AuxScore, tier2MultiheadBlocked) and updated bundled default model + build packaging.

Reviewed changes

Copilot reviewed 10 out of 15 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/types.ts	Extends `Tier2Result` to optionally report aux score for multi-head models.
src/core/prompt-defense.ts	Implements multi-head rule + effective/raw score reporting and threshold propagation logic in the Tier 2 gate.
src/classifiers/tier2-classifier.ts	Loads calibration defaults from `classifier_config.json`; adds multi-head config plumbing and temperature accessor.
src/classifiers/onnx-classifier.ts	Adds output-mode detection (single vs multi), main+aux APIs, and temperature scaling in score computation.
src/classifiers/models/minilm-multihead-v5/tokenizer_config.json	Bundled model asset for the new default model.
src/classifiers/models/minilm-multihead-v5/config.json	Bundled model asset for the new default model.
src/classifiers/models/minilm-multihead-v5/classifier_config.json	Bundled calibration defaults (`temperatureT`, thresholds) consumed at runtime.
src/classifiers/models/minilm-full-aug/.gitkeep	Placeholder intended to prevent silent loads of the legacy model path.
specs/tier2-multihead.spec.ts	Adds tests for multi-head behavior, calibration loading, and the three reported API contract bugs.
specs/tier2-classifier.spec.ts	Updates expectations around default thresholds now coming from model calibration.
specs/onnx-classifier.spec.ts	Updates bundled model path reference to the new default model directory.
scripts/copy-models.cjs	Replaces inline copy logic with a dedicated post-build asset mirroring script.
package.json	Updates `copy-models` script to use `scripts/copy-models.cjs`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ig, tighten invariant docs Three review-driven fixes: 1. Multihead config against a single-head model used to silently disable Tier 2: `classifyChunksBatchPair` returned `{ main, aux: null }` rows, the rule's "no chunk blocks" path triggered aux-veto, and tier2EffectiveScore collapsed to 0. Detect every-aux-null after the batched call and set `tier2SkipReason` so the misconfig surfaces. 2. JSDoc on `tier2Score` claimed the invariant `tier2Score >= highRiskThreshold ⇔ allowed === false` held unconditionally. It doesn't — `blockHighRisk: false` keeps `allowed: true` regardless, and Tier 1 detections can drive `allowed: false` independently. Reword to state the conditions. 3. Inline comment on the return claimed the multihead veto sets `tier2Score` to undefined; the implementation sets 0. Update the comment to match. Also fix the header comment in scripts/copy-models.cjs: it claimed the script writes to dist/classifiers/models/<name> but it writes to dist/models/<name>. Adds a regression spec for #1 using `vi.spyOn` against `Tier2Classifier.prototype.classifyChunksBatchPair`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

readCalibrationDefaults() does a sync readFileSync + JSON.parse on every Tier2Classifier construction. When callers create a PromptDefense per request, that's ~50-200µs of blocked event loop per call — ~100ms/s at 1k req/s, ~1s/s (one saturated core) at 10k req/s — on a file whose contents are bundled at build time and never change at runtime. Cache the result in a module-level Map keyed by modelDir, mirroring the _sessionCache pattern already used for ONNX sessions. First call on a modelDir reads from disk; every subsequent call returns from memory. `null` is a valid cached value ("no calibration block for this model"), so probe with `.has()` rather than `=== undefined`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

hiskudin · 2026-05-14T08:20:21Z

AgentShield re-run on 0803063 — 86.7 ✅

Matches the v5 calibration-findings baseline; threshold-sync fix (87c742b) restored the regression caught on the pre-fix candidate.

Category	Pre-fix candidate	Post-fix `0803063`	Δ
Final	80.9	86.7	+5.8
Composite	81.6	88.2	+6.6
Prompt Injection	85.4	91.7	+6.3
Jailbreak	73.3	82.2	+8.9
Data Exfiltration	77.0	88.5	+11.5
Tool Abuse	75.0	81.3	+6.3
Over-Refusal	95.4	92.3	−3.1
Multi-Agent	94.3	97.1	+2.8
Provenance	65.0	70.0	+5.0

Run config: defaults (cfg: {}, no env vars) → single-head v5 with auto-loaded calibration from classifier_config.json (T=2.41, highRiskThreshold=0.64). 537 test cases. Result file: agentshield-benchmark/results/2026-05-14T08-19-49-654Z.json.

Branch is ready to merge.

willleeney

Review — Multi-head ONNX + Temperature Scaling (v0.7)

Solid work. The core logic and invariants are correct, the three bug fixes are real and well-targeted, and the test coverage is strong (355 new lines in tier2-multihead.spec.ts). AgentShield benchmark confirms parity at 86.7.

Main actionable items: #1 (undefined clobber risk on config merge) and #5 (overly loose threshold assertion). The rest are cleanup / readability.

Six inline comments below.

…lt assertion Two PR-review fixes (#63 threads 1 and 5): 1. Tier2Classifier's caller-config spread used to overwrite model-loaded calibration defaults when the caller passed explicitly-undefined keys. The common pattern `{ temperatureT: settings.t ?? undefined }` (building config conditionally from optional settings) would silently flow `undefined` into OnnxClassifier, bypass its positive-finite guard, and leave the classifier at T=1 without warning. Filter undefined keys out of the partial before spreading so model defaults survive. 2. The `.getConfig()` regression test for the model-auto-loaded highRiskThreshold was loosened to `> 0 && <= 1` in an earlier commit, which passes for any positive value — including the library default 0.8 that the auto-load is supposed to override. An accidentally-removed or malformed calibration block would slip through silently. Replace with `toBeCloseTo(0.64, 2)` to assert the exact shipped value. Adds a regression spec covering the clobber path: constructing with `{ temperatureT: undefined, highRiskThreshold: undefined }` must preserve v5's calibrated defaults (0.64, 2.41). 294 tests pass. Lint clean. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Address PR #63 review nits 2-4 and 6: - Density damping was computed unconditionally then immediately overwritten under multi-head. Move it into the single-head `else if` branch so it only runs when the result actually drives a decision. - `tier2AuxScore` was assigned twice on the multi-head rule-fired path (first to the global-max-main chunk's aux, then overwritten with the rule-triggering chunk's aux). Rename the eager target to a local `auxOfMaxMain` and write `tier2AuxScore` exactly once in the branch that's keeping it. End-state semantics preserved across all three paths (single-head undefined / rule fired = mhTopBlockAux / aux veto = auxOfMaxMain). - Drop the unnecessary `getTemperature?.()` optional chain — the method is always defined on Tier2Classifier. - Add the missing trailing newline to v5's classifier_config.json. Net: post-scoring control flow now reads top-to-bottom in one pass — multi-head branch handles rule + aux-veto cases, single-head branch handles density damping + risk bucketing. 252 unit tests + 23 multi-head ONNX integration tests pass. Lint clean. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…me formatter CI runs `code:check` (lint + format); biome format flagged the multi-line form on the undefined-filter from 2b61b29. Local `code:lint` skipped the format check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The pattern was `/\$$[^)]+$|`[^`]+`/g` — `$(...)` OR any backtick-pair. The second alternative couldn't distinguish bash legacy command substitution from markdown inline code, so every technical README with `cat foo.json`, `npm install`, `~/.claude/...`, or even just `filename.txt` triggered shell_command on Tier 1 with no real attack signal. Defender dogfooded the bug — its own source files (which contain literal backtick-quoted strings as code examples) tripped the rule. Modern attackers default to `$(...)` because it nests cleanly; legacy backtick substitution is rare. Tier 2 still catches the residual backtick attacks via prompt context — Tier 1 dropping the regex just removes a noisy FP source. Regression spec asserts a markdown sample with multiple backtick-inline spans does not fire shell_command, while the existing `$(rm -rf /)` test keeps the positive-case coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… filters Audit follow-up to the shell_command backtick fix. Same bug class — regexes that look gated by a role/qualifier but the gating is `?`-optional or keyword-only, so the regex fires on shape-only matches. Tightenings: - you_are_now: required role-noun terminator (was: both alternation groups optional → matched "you are now " + anything). Role list expanded to cover the actual attack distribution (DAN/GPT/AI/jailbroken/admin/root /hacker/developer/superuser etc.). Fixes UI-copy FP class ("you are now logged in / subscribed / ready"). - pretend_to_be: required attack-shaped role-noun (was: no role constraint → FP'd on children's literature, drama exercises). Same role list with privilege-escalation nouns added. - show_instructions: required attack-qualifier (initial/original/system /hidden); the inner group was optional so bare "show instructions" matched. Fixes FAQ/help-doc FP class. - markdown_hidden_instruction: required imperative + scope qualifier ("ignore all/the/previous/prior"); previous form matched any URL containing "system" or "instruction", so every doc cross-reference like `[config](https://.../system-setup)` FP'd. Uses `\W+` for separator to handle URL-encoded `+` and `%20`. - role_system_xml: required directive content (ignore/disregard/forget /override/you are/new instructions/...) following the tag; bare `<system>` mentions are common in XML schemas / ML config docs / OS specs. - json_injection: targets actual attack shapes — `"role": "system"` (chat-message role hijack) or `"system": "<long string>"` (system-key stuffing). Previous form fired on every OpenAI/Anthropic SDK example declaring those JSON keys. - confusable_homoglyphs: Cyrillic block now requires mixed-script adjacency to Latin letter (the actual attack: `аdmin` with Cyrillic 'а'). Pure Russian text no longer FPs. Cherokee + Phonetic Extensions blocks remain aggressive — those are essentially never in real customer content. Two pre-existing tests updated because they relied on the over-broad patterns: - "should return medium risk" — fixture now uses "pretend to be a hacker" instead of "pretend to be a helpful assistant" (the latter is a benign roleplay request that only triggered the old over-broad regex). - "should detect markdown link with hidden instruction" — fixture URL now uses the attack-shape `?p=ignore+all+previous+instructions` instead of just `ignore-instructions.com`. 14 new regression specs pin both directions of each fix (FP-class fixture does NOT fire / attack-shape fixture DOES fire). 268 unit tests pass, 23/23 multihead ONNX specs pass, biome check clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

willleeney

lgtm

hiskudin and others added 6 commits May 13, 2026 14:56

Merge remote-tracking branch 'origin/main' into feat/tier2-v0.7

cb31ad9

Copilot AI review requested due to automatic review settings May 13, 2026 15:37

hiskudin requested a review from a team as a code owner May 13, 2026 15:37

Copilot started reviewing on behalf of hiskudin May 13, 2026 15:38 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread src/core/prompt-defense.ts

Comment thread src/core/prompt-defense.ts Outdated

Comment thread src/core/prompt-defense.ts

Comment thread src/core/prompt-defense.ts Outdated

Comment thread scripts/copy-models.cjs Outdated

Comment thread src/classifiers/onnx-classifier.ts

hiskudin changed the title ~~feat(tier2): multi-head ONNX + temperature scaling + API contract fixes (v0.7)~~ feat(ENG-335): multi-head ONNX + temperature scaling + API contract fixes (v0.7) May 13, 2026

hiskudin and others added 2 commits May 13, 2026 17:03

willleeney reviewed May 14, 2026

View reviewed changes

hiskudin and others added 2 commits May 14, 2026 10:26

aikido-pr-checks Bot reviewed May 14, 2026

View reviewed changes

Comment thread src/classifiers/tier2-classifier.ts

aikido-pr-checks Bot reviewed May 14, 2026

View reviewed changes

Comment thread src/classifiers/tier2-classifier.ts

hiskudin and others added 3 commits May 14, 2026 10:37

willleeney approved these changes May 14, 2026

View reviewed changes

hiskudin merged commit 616cc10 into main May 14, 2026
3 checks passed

hiskudin deleted the feat/tier2-v0.7 branch May 14, 2026 13:38

stackone-devops-service-account mentioned this pull request May 14, 2026

chore(main): release defender 0.7.0 #62

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ENG-335): multi-head ONNX + temperature scaling + API contract fixes (v0.7)#63

feat(ENG-335): multi-head ONNX + temperature scaling + API contract fixes (v0.7)#63
hiskudin merged 14 commits into
mainfrom
feat/tier2-v0.7

hiskudin commented May 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hiskudin commented May 14, 2026

Uh oh!

willleeney left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

willleeney left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hiskudin commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

AgentShield results

What's in the API

Notes

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hiskudin commented May 14, 2026

Uh oh!

willleeney left a comment

Choose a reason for hiding this comment

Review — Multi-head ONNX + Temperature Scaling (v0.7)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

willleeney left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hiskudin commented May 13, 2026 •

edited

Loading