fix(ENG-129): add obfuscation normalisation chain to Tier 1 detection#39
Conversation
Adds a pre-processing normalisation chain to PatternDetector.analyze so existing injection patterns catch obfuscated variants without requiring new regex entries for each substitution style. Changes: - leet-normalizer.ts (new): normalizeLeetSpeak() reverses digit/symbol substitutions (4→a, 3→e, 1→i, 0→o, 5→s, 7→t); protects hex escapes, base64 blobs and $( from corruption - normalizer.ts: adds normalizeWhitespace() — collapses letter-by-letter spacing (S Y S T E M → SYSTEM) and embedded newlines inside words - pattern-detector.ts: runs normalisation chain (whitespace → unicode → leet) before Tier 1 matching; two-pass pattern run on raw+normalised text with dedup so obfuscation patterns still fire on raw text - encoding-detector.ts: adds decodeAllLevels() for chained encoding (base64 of hex etc.) and containsSuspiciousEncodingDeep() - sanitizer.ts: uses containsSuspiciousEncodingDeep in high-risk path - patterns.ts: removes stale leet entries from FAST_FILTER_KEYWORDS Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
4 issues found across 7 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/classifiers/pattern-detector.ts">
<violation number="1" location="src/classifiers/pattern-detector.ts:107">
P2: Matches from the normalised pass carry `position` and `matched` values from the transformed text, not the caller's original input. The `PatternMatch.position` contract ("Position in the original string") is violated, and `matched` will contain text like `"ignore previous"` when the input was `"1gn0r3 pr3v10us"`. This makes the diagnostic output of `analyze()` misleading and could break any consumer that correlates matches back to the source string.
Consider either (a) mapping normalised-match positions back to the raw text, or (b) adding a flag/field to distinguish normalised-origin matches so consumers know the coordinates are approximate.</violation>
</file>
<file name="src/sanitizers/encoding-detector.ts">
<violation number="1" location="src/sanitizers/encoding-detector.ts:372">
P1: After partial multi-level decoding, the deep check only scans plaintext keywords and can miss suspicious content that remains encoded.</violation>
</file>
<file name="src/sanitizers/normalizer.ts">
<violation number="1" location="src/sanitizers/normalizer.ts:134">
P2: The newline regex is too permissive and also joins separate words across line breaks, not just obfuscated intra-word splits.</violation>
</file>
<file name="specs/pattern-detector.spec.ts">
<violation number="1" location="specs/pattern-detector.spec.ts:315">
P2: The new assertion is too permissive and can miss regressions in leet normalization because it passes without `ignore_previous` being detected.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Pull request overview
This PR strengthens Tier 1 injection detection by normalizing common obfuscation techniques before pattern matching and by enhancing encoding detection to unwrap chained encodings prior to suspicious-content checks.
Changes:
- Add a Tier 1 pre-processing normalization chain (whitespace → Unicode → leet-speak) and run pattern detection in two passes (raw + normalized) with merged results.
- Introduce multi-level encoding decoding (
decodeAllLevels) and a deep suspicious-encoding check used by the high-risk sanitizer path. - Update fast-filter keywords and adjust/relax a leet-related Tier 1 test expectation.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/sanitizers/sanitizer.ts | Switch high-risk encoding detection to use deep multi-level suspicious-encoding checks. |
| src/sanitizers/normalizer.ts | Add normalizeWhitespace() to collapse spaced-letter and embedded-newline obfuscation for Tier 1 analysis. |
| src/sanitizers/leet-normalizer.ts | Add normalizeLeetSpeak() with protected-sequence handling to avoid corrupting encodings / $(. |
| src/sanitizers/encoding-detector.ts | Add iterative decoding (decodeAllLevels) and containsSuspiciousEncodingDeep() for chained encodings. |
| src/classifiers/patterns.ts | Remove raw leet tokens from FAST_FILTER_KEYWORDS (relying on normalized fast-filter hits). |
| src/classifiers/pattern-detector.ts | Wire normalization chain into analyze(), run raw + normalized passes, and merge/dedup matches. |
| specs/pattern-detector.spec.ts | Update leet injection test to accept either normalized or raw leet pattern detection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- encoding-detector: containsSuspiciousEncodingDeep now also runs containsSuspiciousEncoding on the decoded result, catching payloads that remain partially encoded when maxIterations is reached (P1) - types: add normalised?: boolean to PatternMatch to signal when position/matched values reference the normalised form, not the original input string (P2) - pattern-detector: tag normalised-pass matches with normalised:true so consumers can distinguish them from raw-text matches (P2) - normalizer: remove \s* from newline regex to avoid silently consuming word-separator spaces (e.g. "ignore\n previous" → "ignoreprevious" would break multi-word pattern matching) (P2) - test: assert specifically on ignore_previous rather than using || with leetspeak_injection, so regressions in leet normalisation are caught rather than masked by the raw-text pass (P2) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
0 issues found across 5 files (changes from recent commits).
Requires human review: Significant logic changes to the core pattern detection pipeline, including multi-pass normalization and iterative decoding. Requires human review for security heuristics and performance.
…lain text Two bugs found during PR review: 1. Normalisation order was wrong: running normalizeWhitespace before normalizeUnicode means Cyrillic/fullwidth homoglyph spacing attacks (e.g. "s у s t e m" with Cyrillic у) are never collapsed — whitespace normalisation uses [a-zA-Z] and exits before unicode normalisation resolves the homoglyphs to ASCII. Fix: run normalizeUnicode first so all characters are ASCII before whitespace collapse runs. Order is now: normalizeUnicode → normalizeWhitespace → normalizeLeetSpeak 2. Performance: when normalisation produces no change (plain text with keywords, the common case), detectPatterns was called twice on identical text. Added a rawText === analysisText guard to short-circuit to a single pass, restoring the original single-pass performance for unobfuscated input. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
0 issues found across 2 files (changes from recent commits).
Requires human review: This PR introduces significant logic changes to the core security detection pipeline, including a normalization chain and deep encoding detection, which require human validation.
Raw pass was skipped for pure leet-speak inputs (e.g. "1gn0r3 pr3v10us") because leet keywords were removed from FAST_FILTER_KEYWORDS, making rawHasKeywords false. The comment claimed the raw pass "catches obfuscation patterns like leetspeak_injection" but it was never actually running for those inputs. Fix: run the raw pass whenever rawHasKeywords OR normHasKeywords is true, so raw obfuscation patterns fire even when only the normalised text triggered the fast filter. Adds unit tests for: - normalizeWhitespace: letter spacing, embedded newlines, edge cases - normalizeLeetSpeak: substitution map, all protected sequence types ($(/ hex/unicode/base64), ! boundary rule, plain text passthrough - decodeAllLevels: single-layer, double-layer (chained), maxIterations cap, amplification guard - containsSuspiciousEncodingDeep: single/double encoded payloads, benign cases Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
0 issues found across 2 files (changes from recent commits).
Requires human review: Significant changes to core pattern detection logic and the introduction of a new normalization pipeline require human review for potential performance and accuracy impacts.
…lgo detection New encoding detectors (all integrated into detectEncoding/decodeAllLevels): - HTML entities: decodes &#NNN;, &#xHH;, and 30 named entities; gate: 3+ grouped entity tokens; suspicious when decoded content has injection keyword - ROT13: gate 70%+ letter density; only emits when decoded text contains an injection keyword — prevents false positives on arbitrary high-letter text - ROT47: printable ASCII rotation; conservative — only emits on suspicious decoded content - Binary strings: gate 3+ space-separated 8-bit groups of [01]; decodes via parseInt(group, 2) - Morse code: gate 5+ dot/dash groups; 36-entry table (A–Z, 0–9); rejects if >20% unknown symbols Zalgo / combining marks (normalizer.ts): - stripCombiningMarks() strips U+0300–U+036F and 4 other combining ranges - normalizeUnicode() now runs NFD → stripCombiningMarks → NFKC so marks are separated before being stripped (NFKC alone would compose them into precomposed chars that the regex cannot see) - containsSuspiciousUnicode() flags 3+ combining marks Leet-speak improvements (leet-normalizer.ts): - Token-aware normalization: only substitutes within alphanumeric tokens that contain at least one letter — "price: 100" stays "price: 100" - Added @→a and 8→b to LEET_MAP - Includes !, @, $ in token regex so "adm!n", "@dm1n", "$y$tem" normalize correctly - PROTECTED_SEQUENCE continues to guard $( before token processing runs Tier 1 patterns (patterns.ts): - Added binary_string_encoding (medium) and morse_code_encoding (low) patterns - Upgraded rot13_mention severity from low → medium All new behaviour covered by 28 new test cases (240 total, up from 212). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
0 issues found across 5 files (changes from recent commits).
Requires human review: Significant changes to core pattern detection logic, normalization chains, and multi-level decoding require human review to assess security impact and performance.
…ontent ROT13 and ROT47 detections span position=0, length=text.length. When both fired on the same text alongside positional detections (hex/base64), the reverse-position splice loop would apply positional replacements first, then the full-text detection would overwrite them using the original text.length — corrupting previous replacements. In decodeAllLevels (action:"decode"), a text triggering both ROT13 and ROT47 would oscillate across all 5 iterations without converging. Fix: processEncodedContent now separates positional from full-text detections. Positional detections are applied first. Full-text is only applied when there are no positional detections; only the first full-text detection is used when multiple exist. decodeAllLevels naturally converges because after positional content is decoded, the next iteration re-evaluates the full-text transforms. Also adds three tests flagged during review: - normalizeWhitespace: letter-adjacent newline (no surrounding spaces) - normalizeUnicode: precomposed accent (café → cafe) via NFD decomposition - normalizeLeetSpeak: mixed alphanumeric tokens (v3rs10n → version) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
0 issues found across 2 files (changes from recent commits).
Requires human review: Significant additions to core detection logic (normalization chains, deep encoding detection) and 1000+ line diff across security-critical files require human review.
Adds two tests specifically for the full-text detection overlap fix: - processedText is a coherent string (not a corrupted splice) when both ROT13 and ROT47 fire on the same input - decodeAllLevels converges (levels <= 2, not oscillating to maxIterations=5) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
0 issues found across 1 file (changes from recent commits).
Requires human review: Significant feature adding complex normalization and deep encoding detection logic to core security paths. Requires human review for architectural impact and potential edge cases.
…ield Encoded payloads (ROT13, binary, Morse, etc.) don't trigger Tier 1 patterns because their content has no fast-filter keywords — so risk stays at the default "medium" and encoding detection in the sanitizer (Step 4, gated behind riskLevel === "high") never runs. Injections encoded in these formats pass through undetected. Fix: after Tier 1 classification in sanitizeStringField, run containsSuspiciousEncoding (shallow, single pass, ~0.05ms per field) as an additional risk escalation check. If suspicious encoding is found, risk escalates to "high" and the existing deep multi-level decoder in the sanitizer's Step 4 handles decoding and redaction. Before: ROT13/binary/Morse payloads → risk stays medium → encoding detection skipped → allowed: true (injection passes through) After: ROT13/binary/Morse payloads → encoding escalation → risk=high → deep decode + redaction → blocked Quality test results (Tier 1 + sanitizer, no ML): - Precision: 100% (0 false positives on 10 benign inputs) - Recall: 88% (7/8 encoding types caught; ROT47 miss due to short input) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
0 issues found across 2 files (changes from recent commits).
Requires human review: Large change (1200+ lines) adding complex normalization and deep decoding logic to core detection paths. Requires human review to ensure no regressions in detection or performance.
…alisation # Conflicts: # src/core/tool-result-sanitizer.ts
P1: Accent stripping leaked into returned output (data loss bug)
- normalizeUnicode previously did NFD→stripCombiningMarks→NFKC, which
rewrote 'café' → 'cafe' for benign user content. Sanitizer.applyRiskBasedMethods
returns this output to callers, breaking the analysis-only contract.
- Fix: removed stripCombiningMarks from normalizeUnicode (kept NFKC only).
Combining-mark stripping now lives in the analysis-only path inside
PatternDetector.analyze: stripCombiningMarks(rawText.normalize('NFD'))
runs before normalizeUnicode in the chain.
P1: Long leet payloads bypassed the fast filter
- The leet normaliser skips 20+ char alphanumeric tokens (treated as
base64-like blobs), so '1gn0r3pr3v10us1nstruct10ns' was left unchanged
by both the leet normaliser and unicode normaliser. With the leet
keywords removed from FAST_FILTER_KEYWORDS, the fast filter
short-circuited and leetspeak_injection was never evaluated.
- Fix: restore '1gn0r3', 'f0rg3t', 'byp4ss' to FAST_FILTER_KEYWORDS so
long leet payloads still pass the filter and reach the regex.
P2: Chained encoding payloads (e.g. btoa(btoa(...))) slipped through
- sanitizeStringField escalated using shallow containsSuspiciousEncoding,
which only catches single-layer encodings. Doubly-encoded payloads —
where the outer layer decodes to another encoded blob with no visible
keywords — stayed at medium risk and never reached the deep check in
the sanitizer (gated at high).
- Fix: sanitizeStringField now uses containsSuspiciousEncodingDeep which
loops through layers (max 5, with amplification guard). Also tracks
whether escalation came from encoding so the early-block path records
'encoding_detection' in methodsByField — without this, blockHighRisk
would set sanitized to '[CONTENT BLOCKED]' but allowed would stay true
because hasThreats only counts active sanitization methods.
Test updates:
- normalizeUnicode tests now assert accent preservation (the new contract)
- new stripCombiningMarks tests cover the analysis-only Zalgo path
268/269 tests passing (1 pre-existing flaky test on main, not introduced
by this PR — onnx-classifier batch test scoring borderline injection
sample at 0.467 vs >0.5 threshold).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ty redaction Two issues from Codex review: P1: PatternDetector classified obfuscated payloads (1gn0r3 pr3v10us, S Y S T E M:, Zalgo) by running Tier 1 against normalized text, but Sanitizer.applyRiskBasedMethods ran containsRoleMarkers/removePatterns against the un-normalized original. Result: detection escalated risk to medium/high but the actual obfuscated content survived in the sanitized output and could flow into downstream prompts. Fix: added Step 1.5 in the sanitizer that runs at HIGH risk only — NFD decompose + stripCombiningMarks + normalizeWhitespace + normalizeLeetSpeak — before role stripping and pattern removal. At high risk Tier 1 already has high confidence of an attack, so the trade-off of stripping benign accents from the redacted output is acceptable. Medium-risk benign content is unaffected. P2: detectHtmlEntities pushed every 3+ entity run into detections, causing redactAllEncoding to wipe benign escaped content like 10% ("10%") whenever the field was escalated to high risk for an unrelated reason. The decoder doc claimed "Only emits suspicious" but the code emitted all. Fix: kept the detector pushing all detections (so decodeAllLevels can chain through HTML→base64→plaintext correctly), but processEncodedContent in REDACT mode now filters out non-suspicious HTML entities. Decode mode is unaffected. Verified end-to-end: - Leet, whitespace-spaced, and Zalgo payloads are now redacted in output - Benign HTML entities (10% encoded) survive sanitization - Accented text (café, niño) preserved at low/medium risk Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
whitespace → unicode → leet-speak) toPatternDetector.analyzeso existing injection patterns catch obfuscated variants without new regex entries for each substitution styleinvisible_unicode) on the original textChanges
src/sanitizers/leet-normalizer.ts(new)normalizeLeetSpeak()— reverses4→a 3→e 1→i 0→o 5→s 7→t; protects hex escapes, base64 blobs,$(src/sanitizers/normalizer.tsnormalizeWhitespace()— collapsesS Y S T E M→SYSTEM, removes embedded newlines inside wordssrc/classifiers/pattern-detector.tssrc/sanitizers/encoding-detector.tsdecodeAllLevels()+containsSuspiciousEncodingDeep()for chained encodingsrc/sanitizers/sanitizer.tscontainsSuspiciousEncodingDeepin high-risk encoding stepsrc/classifiers/patterns.tsFAST_FILTER_KEYWORDS(now redundant after normalisation)Test plan
npm test— 186/186 passing1gn0r3 pr3v10us 1nstruct10ns→ Tier 1 firesignore_previous(was: missed entirely)S Y S T E M : override→ detected via whitespace normalisationign\nore previous instructions→ detected via newline normalisation$(echo hello)→ not corrupted, correctly passesdecodeAllLevels🤖 Generated with Claude Code
Summary by cubic
Adds a unicode → whitespace → leet normalization chain to Tier 1 and deep multi‑level encoding detection to catch obfuscated and chained injections without new regex; also escalates risk on encoded payloads so the sanitizer fully decodes and blocks them. Implements ENG‑12549 and keeps plain‑text scans fast.
New Features
@→a,8→b,!between letters →i,$→sunless$(); protects\x..,\u...., base64 blobs, and$(; order is unicode → whitespace → leet.raw === normalised; run raw pass when either side has keywords; merge results with de‑dup and tag normalised matches vianormalised: true; structural checks on raw text.decodeAllLevels()unwraps chained encodings with iteration and amplification guards; detectors for HTML entities, ROT13, ROT47, binary, and Morse;containsSuspiciousEncodingDeep()checks fully decoded text and is used in high‑risk sanitization.binary_string_encodingandmorse_code_encoding; raisedrot13_mentionseverity; removed leet entries fromFAST_FILTER_KEYWORDS.containsSuspiciousEncodingcheck in the tool result sanitizer upgrades risk to high for ROT13/binary/Morse/etc., triggering deep decode + redaction.Bug Fixes
decodeAllLevels()now converges and is covered by ROT13+ROT47 overlap tests.normalizeWhitespaceno longer consumes spaces around newlines, preserving word boundaries.Written for commit a9bdb57. Summary will update on new commits.