fix: upgrade ML classifier to jbv2 (AgentShield 73.7 → 79.8) by hiskudin · Pull Request #1 · StackOneHQ/stackone-defender

hiskudin · 2026-03-17T15:03:57Z

Summary

Updates ONNX model files to jbv2 (AgentShield score: 73.7 → 79.8)
Fixes enable_tier2 default: False → True to match TypeScript SDK behaviour
Updates README to reflect the new default

Changes

models/minilm-full-aug/ — updated to jbv2 model (md5: 18b50c8a27b669dfc9c940bd42fa7b4d)
src/stackone_defender/core/prompt_defense.py — enable_tier2: bool = False → True
README.md — updated (default: False) comment to (default: True), removed redundant enable_tier2=True from examples

Why enable_tier2 defaults to True

The TypeScript SDK (@stackone/defender) has always defaulted enableTier2 to true via options.enableTier2 ?? true. The Python SDK had an inconsistent False default, meaning users had to explicitly opt in to ML classification. This fix aligns the two SDKs.

jbv2 Model Performance (AgentShield benchmark)

Category	Before (jbv5)	After (jbv2)
Overall Score	73.7	79.8
Prompt Injection	79.5%	92.7%
Jailbreak	48.9%	68.9%
Data Exfiltration	85.1%	92.0%
Tool Abuse	77.5%	83.8%
Over-Refusal	84.6%	72.3%
Multi-Agent Security	80.0%	88.6%
Provenance & Audit	70.0%	80.0%

🤖 Generated with Claude Code

Replace baseline ONNX with full-aug-dojo-jailbreak-jbv2 variant: - AgentShield score: 73.7 → 79.8 (+3.3 pts, new best) - Jailbreak detection: 48.9% → 68.9% (+20 pts) - Prompt injection: 79.5% → 92.7% (+13.2 pts) - DAN-variant subcategory: 20% → 80% Also sync README with JS package: - Add banner image and badges header (PyPI-adapted) - Update description to match JS package wording - Remove incorrect Git LFS section Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replaces jbv2 ONNX model with jbv5. Fixes Google 2FA/security alert emails being flagged as injections while improving overall benchmark score. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Upgrades the bundled Tier 2 ML classifier (quantized ONNX) to the jbv2 variant and refreshes the repository README to match the JS package’s branding and messaging.

Changes:

Updates the README header with a centered banner + badges and adjusts the one-line project description.
Removes the README’s Git LFS section (previously incorrect per PR description).
(Per PR metadata) Upgrades the bundled ONNX classifier to full-aug-dojo-jailbreak-jbv2 with improved benchmark results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Each sentence is now truncated to max_text_length before being passed to the ONNX classifier, consistent with the truncation applied in classify(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replaces jbv5 model with jbv2 (full-aug-dojo-jailbreak-jbv2). AgentShield: 73.7 → 79.8 (composite 77.2 → 87.4, penalty 3.51 → 7.54) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Track risky_field_names on sanitizer metadata for Tier 2 field selection - Tier2Config.tier2_fields, PromptDefense(tier2_fields=...), and Node-style extract_strings with fields_for_tier2 precedence (explicit > risky names > all) - DefenseResult.tier2_skip_reason for empty input and classifier skips - Module-level ONNX session/tokenizer cache keyed by resolved path; warn on load failure - Tests for metadata, Tier 2 scoping mocks, skip reasons, shared ONNX session Made-with: Cursor

- onnx_classifier: keep session cache and logging; add _load_failed on ImportError; use src-relative model path from main - prompt_defense: single strict _extract_strings (no string leaves under non-matching keys); dedupe DefenseResult; drop sanitizer tier2 kwargs removed on main - types: single tier2_skip_reason field; keep tier2_fields comment - tests: Tier 2 scoping expectations match ENG-12518 Made-with: Cursor

BREAKING CHANGE: Drop ToolSanitizationRule, config/sanitizer tool_rules, use_default_tool_rules, and get_tool_rule/should_skip_field. Matches @stackone/defender post ENG-12594. - Tier2 classify_by_sentence uses one classify_batch call - Per cache-key threading.Lock for concurrent ONNX load + session cache Made-with: Cursor

…sion

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

hiskudin and others added 2 commits March 17, 2026 15:00

feat: upgrade ML classifier to jbv5 (AgentShield 79.8 → 81.1)

781dd10

Replaces jbv2 ONNX model with jbv5. Fixes Google 2FA/security alert emails being flagged as injections while improving overall benchmark score. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings March 20, 2026 16:47

Copilot started reviewing on behalf of hiskudin March 20, 2026 16:48 View session

hiskudin changed the title ~~feat: upgrade ML classifier to jbv2 (AgentShield 73.7 → 79.8)~~ feat: upgrade ML classifier to jbv5 (AgentShield 73.7 → 81.1) Mar 20, 2026

Copilot AI reviewed Mar 20, 2026

View reviewed changes

Comment thread README.md

fix(tier2): apply max_text_length truncation in classify_by_sentence

a67d2c6

Each sentence is now truncated to max_text_length before being passed to the ONNX classifier, consistent with the truncation applied in classify(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

hiskudin changed the title ~~feat: upgrade ML classifier to jbv5 (AgentShield 73.7 → 81.1)~~ feat: upgrade ML classifier to jbv2 (AgentShield 73.7 → 79.8) Mar 24, 2026

hiskudin changed the title ~~feat: upgrade ML classifier to jbv2 (AgentShield 73.7 → 79.8)~~ fix: upgrade ML classifier to jbv5 (AgentShield 73.7 → 74.3) Mar 24, 2026

fix: upgrade ML classifier to jbv2 (AgentShield 73.7 → 79.8)

ccb1204

Replaces jbv5 model with jbv2 (full-aug-dojo-jailbreak-jbv2). AgentShield: 73.7 → 79.8 (composite 77.2 → 87.4, penalty 3.51 → 7.54) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

hiskudin changed the title ~~fix: upgrade ML classifier to jbv5 (AgentShield 73.7 → 74.3)~~ fix: upgrade ML classifier to jbv2 (AgentShield 73.7 → 79.8) Mar 24, 2026

hiskudin and others added 7 commits March 26, 2026 10:16

fix: default enable_tier2 to True to match TypeScript SDK behaviour

d66773b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: update README — enable_tier2 defaults to True

af0d059

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: update README to reflect changes in package name and Python ver…

d2fc2ca

…sion

Merge branch 'main' into feat/upgrade-model-jbv2

aa23586

hiskudin requested a review from Copilot April 8, 2026 15:16

Copilot started reviewing on behalf of hiskudin April 8, 2026 15:17 View session

Copilot AI reviewed Apr 8, 2026

hiskudin requested a review from Copilot April 8, 2026 15:47

Copilot started reviewing on behalf of hiskudin April 8, 2026 15:48 View session

Copilot AI reviewed Apr 8, 2026

hiskudin merged commit b452b39 into main Apr 8, 2026
10 of 12 checks passed

hiskudin deleted the feat/upgrade-model-jbv2 branch April 8, 2026 16:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: upgrade ML classifier to jbv2 (AgentShield 73.7 → 79.8)#1

fix: upgrade ML classifier to jbv2 (AgentShield 73.7 → 79.8)#1
hiskudin merged 11 commits into
mainfrom
feat/upgrade-model-jbv2

hiskudin commented Mar 17, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hiskudin commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Why enable_tier2 defaults to True

jbv2 Model Performance (AgentShield benchmark)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hiskudin commented Mar 17, 2026 •

edited

Loading