feat(ENG-12699): TypeScript parity and synced ONNX bundle by hiskudin · Pull Request #8 · StackOneHQ/stackone-defender

hiskudin · 2026-04-21T12:47:58Z

Summary

This release aligns stackone-defender Python with the current @stackone/defender TypeScript behavior and refreshes the bundled MiniLM ONNX assets so byte-for-byte hashes match the TS repo.

What changed

Config and sanitizer: Dangerous-key filtering, fractional cumulative risk thresholds, and traversal hardening (__proto__, constructor, prototype).
Tier 2 and ONNX: Packed-chunk Tier 2 flow, density-adjusted scoring, and memory-bounded ONNX batch chunking (max batch size 32).
Optional SFE: use_sfe, bundled sfe/model.ftz, and fasttext-wheel optional extra.
Types / API: DefenseResult.fields_dropped, truncated_at_depth, and related create_config merges.
Models: minilm-full-aug/ (model_quantized.onnx, config.json, tokenizer.json, tokenizer_config.json) copied from defender so Python matches TS.

Testing

uv run pytest — 188 passed.

Made with Cursor

Summary by cubic

Aligns stackone-defender Python with @stackone/defender 0.6.1 and syncs the MiniLM ONNX bundle. Meets ENG-12699 parity with packed-chunk Tier 2, optional SFE preprocessing, and traversal hardening for safer, more accurate detection.

New Features
- Tier 2: sentence packing with token-bounded chunks, memory-safe batch chunking (32 max), and density-adjusted scoring.
- Optional SFE: use_sfe flag with bundled sfe/model.ftz; install via stackone-defender[sfe]; uses fasttext-ng; fails open if unavailable.
- Sanitizer/traversal: drops dangerous keys (__proto__, constructor, prototype), adds fractional cumulative-risk thresholds and stack-depth cap.
- API/Types: DefenseResult.fields_dropped, DefenseResult.truncated_at_depth, and SanitizationMetadata.dangerous_keys_removed; improved create_config merges; MiniLM artifacts synced to match TS.
Bug Fixes
- Tier 2 scoping mirrors TS: when tier2_fields is None, use Tier 1 risky_field_names; otherwise all strings.
- ONNX token counting excludes padding to keep chunk splitting accurate.
- Cumulative risk thresholds now merge with defaults and support partial custom dicts; SFE predictor loading is thread-safe.

^{Written for commit bf173ac. Summary will update on new commits.}

…ndle - Port dangerous-key filtering, fractional cumulative risk, and traversal config. - Add packed-chunk Tier 2 flow, density adjustment, and ONNX batch chunking. - Add optional SFE (fasttext) with bundled model and extras. - Sync minilm-full-aug artifacts (quantized ONNX, tokenizer, config) with @stackone/defender. - Bump version and release metadata; update changelog and README. Made-with: Cursor

fasttext-wheel 0.9.2 has no cp313 wheels; resolving it in the dev group forced a broken sdist build on GitHub Actions. Remove it from dev deps (SFE tests use mocks). Gate the [sfe] extra with a Python version marker and document 3.13 behavior in the README. Made-with: Cursor

Copilot

Pull request overview

Release 0.6.1 updates the Python stackone-defender package to match the current TypeScript @stackone/defender behavior, including refreshed bundled MiniLM ONNX assets and new preprocessing/scoring behavior.

Changes:

Adds optional SFE preprocessing (use_sfe) with bundled FastText model support (fail-open when unavailable).
Updates Tier 2 flow to packed-chunk batching, density-adjusted scoring, and ONNX batch chunking to bound memory.
Hardens traversal/sanitization: dangerous key filtering (__proto__, constructor, prototype) and fractional cumulative-risk thresholds.

Reviewed changes

Copilot reviewed 20 out of 23 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
uv.lock	Adds `fasttext-wheel` (and transitive deps) and bumps project version to `0.6.1` with new extras metadata.
pyproject.toml	Bumps version to `0.6.1`; adds `sfe` extra and dev dependency for `fasttext-wheel`.
README.md	Documents SFE extra/usage; updates Tier 2 description; documents new `DefenseResult` fields.
CHANGELOG.md	Adds `0.6.1` release notes and breaking-change callouts.
.release-please-manifest.json	Updates manifest version to `0.6.1`.
src/stackone_defender/config.py	Introduces `DANGEROUS_KEYS` + `MAX_TRAVERSAL_DEPTH`; deep-copies defaults; adds fractional cumulative-risk thresholds.
src/stackone_defender/types.py	Extends config/metadata/result types for fractional thresholds, dangerous-key reporting, and new result fields.
src/stackone_defender/core/tool_result_sanitizer.py	Filters dangerous keys during traversal; adjusts cumulative risk accounting to support fractional thresholds.
src/stackone_defender/core/prompt_defense.py	Adds `use_sfe`; switches Tier 2 to chunk prep + batched chunk scoring; reports `fields_dropped`/`truncated_at_depth`.
src/stackone_defender/sfe/preprocess.py	New SFE preprocessing implementation with predictor caching and depth-bounded traversal.
src/stackone_defender/sfe/init.py	Exports SFE public API.
src/stackone_defender/classifiers/onnx_classifier.py	Adds bounded batch chunking and token counting/max-length helpers.
src/stackone_defender/classifiers/tier2_classifier.py	Adds chunk preparation + packed-sentence chunking path; batch chunk passthrough API.
src/stackone_defender/init.py	Exposes SFE symbols at package top-level.
src/stackone_defender/models/minilm-full-aug/config.json	Syncs bundled model metadata with TS assets.
src/stackone_defender/models/minilm-full-aug/tokenizer_config.json	Syncs tokenizer config with TS assets.
tests/test_tier2_classifier.py	Adds tests for `prepare_chunks` skipping and chunk-batch passthrough.
tests/test_onnx_classifier.py	Adds test coverage for ONNX batch chunking behavior.
tests/test_sfe.py	New tests for SFE preprocessing and `PromptDefense` integration (`fields_dropped`).
tests/test_integration.py	Adds dangerous-key removal test; updates Tier 2 scoping tests to new chunk-based flow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cubic-dev-ai

7 issues found across 23 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="README.md">

<violation number="1" location="README.md:168">
P2: Use `field(default_factory=list)` instead of `[]` for the dataclass list default in the README snippet.</violation>
</file>

<file name="src/stackone_defender/core/tool_result_sanitizer.py">

<violation number="1" location="src/stackone_defender/core/tool_result_sanitizer.py:277">
P2: Accessing `medium_fraction`/`patterns_fraction` without defaults can raise `KeyError` for valid custom threshold dicts that omit the new keys.</violation>
</file>

<file name="src/stackone_defender/config.py">

<violation number="1" location="src/stackone_defender/config.py:90">
P2: `tool_overrides` is shallow-copied, so nested lists are shared with global defaults and can be mutated across configs.</violation>
</file>

<file name="src/stackone_defender/classifiers/onnx_classifier.py">

<violation number="1" location="src/stackone_defender/classifiers/onnx_classifier.py:134">
P2: `count_tokens` returns padded sequence length, not the actual token count, because tokenizer padding is enabled globally.</violation>
</file>

<file name="src/stackone_defender/sfe/preprocess.py">

<violation number="1" location="src/stackone_defender/sfe/preprocess.py:66">
P2: TOCTOU race: the lock is released between the cache-miss check and the model load, so concurrent threads can each load the model redundantly. Hold the lock across the full check-and-populate block to prevent duplicate expensive loads.</violation>

<violation number="2" location="src/stackone_defender/sfe/preprocess.py:200">
P2: Depth tracking is inconsistent between `_extract_fields` (arrays don't increment `depth`) and `_filter_by_paths` (arrays do increment `depth`). For deeply nested array structures, fields extracted for drop-classification may not be reachable by the filter, so they silently survive. Either both functions should count array levels the same way, or `_filter_by_paths` should mirror `_extract_fields` by using a separate `stack_depth` parameter.</violation>
</file>

<file name="src/stackone_defender/classifiers/tier2_classifier.py">

<violation number="1" location="src/stackone_defender/classifiers/tier2_classifier.py:133">
P1: `count_tokens` always returns 256 because the tokenizer has `enable_padding(length=256)` set, so `len(encoding.ids)` includes padding tokens. Since `get_max_length()` also returns 256, the condition `total_tokens <= model_max_len` is always true and the entire chunk-splitting branch below is dead code. The same applies to `prepare_chunks` and `_pack_sentences`.

`count_tokens` should strip padding tokens before returning, e.g. by counting non-pad ids or using `len(encoding.tokens)` without padding, or by temporarily disabling padding for the count.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai

0 issues found across 4 files (changes from recent commits).

_{Requires human review: Auto-approval blocked by 7 unresolved issues from previous reviews.}

fasttext-wheel lacks reliable cp313 wheels and fails sdist builds on CI. fasttext-ng provides the same fasttext import namespace, supports Python 3.11+, and declares numpy>=2.3. Add it to dev so SFE-related tests run with the real module when available; refresh lockfile and docs. Made-with: Cursor

- Tier 2 string extraction: when tier2_fields is None, scope to Tier 1 risky_field_names when present; else all strings. Align integration test. - ONNX count_tokens: sum attention_mask so padded length does not disable chunk splitting; add regression test. - Cumulative escalation: merge defaults into sanitizer thresholds; use .get with defaults in _should_escalate for partial custom dicts. - create_config: deep-copy tool_overrides list values. - SFE: hold predictor lock across import/load; align list depth in filter/compact. - README DefenseResult snippet: field(default_factory=list). - Tier2Config docstring: clarify None vs empty list semantics. Made-with: Cursor

cubic-dev-ai

0 issues found across 5 files (changes from recent commits).

_{Requires human review: Auto-approval blocked by 7 unresolved issues from previous reviews.}

cubic-dev-ai

0 issues found across 9 files (changes from recent commits).

_{Requires human review: Significant alignment PR with core logic changes, new dependencies, and API shape modifications. Not a low-risk or trivial update.}

Copilot AI review requested due to automatic review settings April 21, 2026 12:47

Copilot started reviewing on behalf of hiskudin April 21, 2026 12:48 View session

hiskudin changed the title ~~Release 0.6.1: TypeScript parity and synced ONNX bundle~~ feat(ENG-12699): TypeScript parity and synced ONNX bundle Apr 21, 2026

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Comment thread src/stackone_defender/core/prompt_defense.py Outdated

Comment thread src/stackone_defender/core/tool_result_sanitizer.py

Comment thread tests/test_integration.py Outdated

cubic-dev-ai Bot reviewed Apr 21, 2026

View reviewed changes

hiskudin added 2 commits April 21, 2026 14:41

cubic-dev-ai Bot reviewed Apr 21, 2026

View reviewed changes

glebedel approved these changes Apr 22, 2026

View reviewed changes

hiskudin merged commit 0449800 into main Apr 22, 2026
8 checks passed

hiskudin deleted the feat/python-0.6.1-ts-parity-onnx branch April 22, 2026 15:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ENG-12699): TypeScript parity and synced ONNX bundle#8

feat(ENG-12699): TypeScript parity and synced ONNX bundle#8
hiskudin merged 4 commits into
mainfrom
feat/python-0.6.1-ts-parity-onnx

hiskudin commented Apr 21, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hiskudin commented Apr 21, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Testing

Summary by cubic

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hiskudin commented Apr 21, 2026 •

edited by cubic-dev-ai Bot

Loading