Skip to content

fix: upgrade ML classifier to jbv2 (AgentShield 73.7 → 79.8)#1

Merged
hiskudin merged 11 commits into
mainfrom
feat/upgrade-model-jbv2
Apr 8, 2026
Merged

fix: upgrade ML classifier to jbv2 (AgentShield 73.7 → 79.8)#1
hiskudin merged 11 commits into
mainfrom
feat/upgrade-model-jbv2

Conversation

@hiskudin

@hiskudin hiskudin commented Mar 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Updates ONNX model files to jbv2 (AgentShield score: 73.7 → 79.8)
  • Fixes enable_tier2 default: FalseTrue to match TypeScript SDK behaviour
  • Updates README to reflect the new default

Changes

  • models/minilm-full-aug/ — updated to jbv2 model (md5: 18b50c8a27b669dfc9c940bd42fa7b4d)
  • src/stackone_defender/core/prompt_defense.pyenable_tier2: bool = FalseTrue
  • README.md — updated (default: False) comment to (default: True), removed redundant enable_tier2=True from examples

Why enable_tier2 defaults to True

The TypeScript SDK (@stackone/defender) has always defaulted enableTier2 to true via options.enableTier2 ?? true. The Python SDK had an inconsistent False default, meaning users had to explicitly opt in to ML classification. This fix aligns the two SDKs.

jbv2 Model Performance (AgentShield benchmark)

Category Before (jbv5) After (jbv2)
Overall Score 73.7 79.8
Prompt Injection 79.5% 92.7%
Jailbreak 48.9% 68.9%
Data Exfiltration 85.1% 92.0%
Tool Abuse 77.5% 83.8%
Over-Refusal 84.6% 72.3%
Multi-Agent Security 80.0% 88.6%
Provenance & Audit 70.0% 80.0%

🤖 Generated with Claude Code

hiskudin and others added 2 commits March 17, 2026 15:00
Replace baseline ONNX with full-aug-dojo-jailbreak-jbv2 variant:
- AgentShield score: 73.7 → 79.8 (+3.3 pts, new best)
- Jailbreak detection: 48.9% → 68.9% (+20 pts)
- Prompt injection: 79.5% → 92.7% (+13.2 pts)
- DAN-variant subcategory: 20% → 80%

Also sync README with JS package:
- Add banner image and badges header (PyPI-adapted)
- Update description to match JS package wording
- Remove incorrect Git LFS section

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces jbv2 ONNX model with jbv5. Fixes Google 2FA/security alert
emails being flagged as injections while improving overall benchmark score.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 20, 2026 16:47
@hiskudin hiskudin changed the title feat: upgrade ML classifier to jbv2 (AgentShield 73.7 → 79.8) feat: upgrade ML classifier to jbv5 (AgentShield 73.7 → 81.1) Mar 20, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Upgrades the bundled Tier 2 ML classifier (quantized ONNX) to the jbv2 variant and refreshes the repository README to match the JS package’s branding and messaging.

Changes:

  • Updates the README header with a centered banner + badges and adjusts the one-line project description.
  • Removes the README’s Git LFS section (previously incorrect per PR description).
  • (Per PR metadata) Upgrades the bundled ONNX classifier to full-aug-dojo-jailbreak-jbv2 with improved benchmark results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md
Each sentence is now truncated to max_text_length before being passed to
the ONNX classifier, consistent with the truncation applied in classify().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@hiskudin hiskudin changed the title feat: upgrade ML classifier to jbv5 (AgentShield 73.7 → 81.1) feat: upgrade ML classifier to jbv2 (AgentShield 73.7 → 79.8) Mar 24, 2026
@hiskudin hiskudin changed the title feat: upgrade ML classifier to jbv2 (AgentShield 73.7 → 79.8) fix: upgrade ML classifier to jbv5 (AgentShield 73.7 → 74.3) Mar 24, 2026
Replaces jbv5 model with jbv2 (full-aug-dojo-jailbreak-jbv2).

AgentShield: 73.7 → 79.8 (composite 77.2 → 87.4, penalty 3.51 → 7.54)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@hiskudin hiskudin changed the title fix: upgrade ML classifier to jbv5 (AgentShield 73.7 → 74.3) fix: upgrade ML classifier to jbv2 (AgentShield 73.7 → 79.8) Mar 24, 2026
hiskudin and others added 7 commits March 26, 2026 10:16
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Track risky_field_names on sanitizer metadata for Tier 2 field selection
- Tier2Config.tier2_fields, PromptDefense(tier2_fields=...), and Node-style
  extract_strings with fields_for_tier2 precedence (explicit > risky names > all)
- DefenseResult.tier2_skip_reason for empty input and classifier skips
- Module-level ONNX session/tokenizer cache keyed by resolved path; warn on load failure
- Tests for metadata, Tier 2 scoping mocks, skip reasons, shared ONNX session

Made-with: Cursor
- onnx_classifier: keep session cache and logging; add _load_failed on ImportError; use src-relative model path from main
- prompt_defense: single strict _extract_strings (no string leaves under non-matching keys); dedupe DefenseResult; drop sanitizer tier2 kwargs removed on main
- types: single tier2_skip_reason field; keep tier2_fields comment
- tests: Tier 2 scoping expectations match ENG-12518

Made-with: Cursor
BREAKING CHANGE: Drop ToolSanitizationRule, config/sanitizer tool_rules, use_default_tool_rules, and get_tool_rule/should_skip_field. Matches @stackone/defender post ENG-12594.

- Tier2 classify_by_sentence uses one classify_batch call
- Per cache-key threading.Lock for concurrent ONNX load + session cache

Made-with: Cursor

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@hiskudin hiskudin merged commit b452b39 into main Apr 8, 2026
10 of 12 checks passed
@hiskudin hiskudin deleted the feat/upgrade-model-jbv2 branch April 8, 2026 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants