From 73589fbac0c05970161d2f7c8b1181769ad48276 Mon Sep 17 00:00:00 2001
From: bishnubista <bista.developer@gmail.com>
Date: Sat, 2 May 2026 13:47:48 -0700
Subject: [PATCH 1/2] feat(SAFE-M-5): expand Content Sanitization stub to
 template parity
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

SAFE-M-5 was a 30-line stub with [TO BE COMPLETED] under Technical
Implementation despite being cited by 14 techniques. Same defect class
as SAFE-M-11 / SAFE-M-12.

This expansion authors all missing template sections plus an explicit
Out of scope section, defines M-5 as a deterministic rule-based filter
at the inbound stage of an M-5 → M-3 escalation pipeline (tool-
description load, retrieved-memory ingest), preserves and strengthens
the honest "insufficient alone" framing the prior version already
acknowledged, and sharply distinguishes M-5 from related mitigations
(M-4 Unicode-specific specialization; M-3 ML-based escalation target;
M-22 output-side complement; M-1 ambient architectural control above
which M-5 runs).

Section additions:
- Technical Implementation: 5 Core Principles (inbound-only multi-stage
  filtering, defense-in-depth not standalone, fail-loud-not-silent
  emit to M-12 with confidence_score and match_count fields, operator-
  tunable patterns with hard-floor vs tunable distinction and explicit
  suppression governance, regex-engine choice with ReDoS-safety
  guidance via RE2-style linear-time engines). Architecture diagram
  showing inbound emit points only (output is M-22's surface).
- Benefits, Limitations (with explicit "insufficient alone" honesty,
  ReDoS risk for operator-extended patterns, scope-boundary statements
  for output / parameters / memory-write / prompts).
- Out of scope: per-citer detail for 3 mislabels (T1202, T1704, T2103)
  and 4 partial-fits (T1302, T1702, T1801, T1911) with named redirect
  candidates per case (T1801 plausibly M-22 if extended; T1302/T1911
  likely need a new "Parameter Validation" mitigation; T1702 likely
  needs a new "Memory-Write Hygiene" mitigation).
- Implementation Examples: multi-stage Python sanitizer that consumes
  the YAML schema (honors applies_to, weight, weighted_sum aggregator,
  active suppressions) with real STRIP semantics tied to M-3's
  "suspicious but legitimate" verdict; pattern-ruleset YAML schema
  with operator-tunable + suppression governance; M-3 escalation hook
  with configurable suspicion-score threshold.
- Testing and Validation: synthetic injection-corpus replay including
  ReDoS-suspect input handling; M-3 escalation handoff under varying
  thresholds; suppression-policy version-skew alarm test.
- Deployment Considerations: resource and performance (no uncited
  timing claims), monitoring (with alarms for rule-firing-rate
  drops/spikes, policy version skew, M-3 escalation queue depth).
- Current Status (2026): source-backed only — OWASP Input Validation
  Cheat Sheet, NIST SP 800-53 SI-10 (Information Input Validation),
  OWASP LLM01.

Mitigates list curated to 7 directly-mapped citers with technique-
specific rationale per entry. Excluded with reason codes:
- 3 mislabels (T1202 wants token-storage; T1704 wants isolation
  boundary; T2103 wants least-privilege controls) tracked as
  safe-m-5-mislabel-cluster follow-up.
- 4 partial-fit citers (T1302, T1702, T1801, T1911) cite M-5 with
  matching concept but each technique's primary ask falls outside
  M-5's content-sanitization scope. Tracked as
  safe-m-5-partial-fit-cluster follow-up; the corpus may need new
  mitigations for parameter validation, memory-write hygiene, and
  prompt-path sanitization.

Related Mitigations: kept M-3 and M-4 with sharper boundary
descriptions. Added M-22 (output-side complement — non-overlapping;
tool-response receipt removed from M-5's emit points), M-1 (ambient
architectural control — M-5 does NOT escalate to M-1), M-12 (audit
substrate for sanitization decisions).

References: corrected NIST SP 800-53 SI-15 (output filtering, the
prior incorrect citation) to SI-10 (Information Input Validation);
added OWASP LLM01, MCP Specification, Russ Cox's foundational
"Regular Expression Matching Can Be Simple And Fast" article that
motivates the RE2-style linear-time engine recommendation.

Preserves the v0.1 and v0.2 Frederick Kautz Version History rows per
the corpus convention.

Signed-off-by: bishnubista <bista.developer@gmail.com>
---
 mitigations/SAFE-M-5/README.md | 367 ++++++++++++++++++++++++++++++++-
 1 file changed, 360 insertions(+), 7 deletions(-)

diff --git a/mitigations/SAFE-M-5/README.md b/mitigations/SAFE-M-5/README.md
index cd11b765..0e16954f 100644
--- a/mitigations/SAFE-M-5/README.md
+++ b/mitigations/SAFE-M-5/README.md
@@ -8,24 +8,377 @@
 **First Published**: 2025-01-03
 
 ## Description
-Content Sanitization filters MCP-related content (including tool descriptions, tool outputs, error messages, and other data) to remove hidden content and instruction patterns using pattern-based detection combined with structural analysis. This mitigation applies sanitization at multiple points in the MCP pipeline to prevent prompt injection from various sources. Note that pattern-based filtering alone is insufficient and should be combined with other mitigations.
+Content Sanitization is a rule-based filter applied at *inbound* MCP pipeline emit points — tool-description load and retrieved-memory ingest — to strip hidden content patterns and instruction markers before that content enters the model context. It is the deterministic stage of an M-5 → M-3 escalation pipeline: rule hits with low confidence escalate to M-3 *AI-Powered Content Analysis* for model-based classification. It uses pattern matching and structural analysis against an operator-tunable ruleset, with hard-floor rules (always-block) distinguished from tunable rules (operator-configurable threshold). Each sanitization decision (pass / strip / block) emits a structured event to SAFE-M-12 *Audit Logging* so SAFE-M-11 *Behavioral Monitoring* can baseline filter-firing rates over time.
+
+This mitigation is intentionally a layer in defense-in-depth, not a standalone defense. Pattern-based filtering alone is necessarily incomplete: novel attack patterns evade rule sets until added, and adversaries iterate evasion variants. Pair M-5 with [SAFE-M-1](../SAFE-M-1/README.md) *Architectural Defense - Control/Data Flow Separation* (the ambient architectural control that M-5 runs inside), [SAFE-M-3](../SAFE-M-3/README.md) *AI-Powered Content Analysis* (the second-stage classifier for low-confidence M-5 hits, escalation-gated by an operator-tunable suspicion-score threshold), [SAFE-M-4](../SAFE-M-4/README.md) *Unicode Sanitization and Filtering* (the narrow Unicode-specific specialization run as a deterministic pre-pass), and [SAFE-M-22](../SAFE-M-22/README.md) *Semantic Output Validation* (the symmetric output-side gate; M-22 owns outbound surfaces that are intentionally not part of M-5).
 
 ## Mitigates
-- [SAFE-T1001](../../techniques/SAFE-T1001/README.md): Tool Poisoning Attack (TPA)
-- [SAFE-T1102](../../techniques/SAFE-T1102/README.md): Prompt Injection (Multiple Vectors)
+
+The mitigation directly addresses the following techniques (curated against the actual citation graph; 3 mislabels and 4 partial-fit citers excluded — see Out of scope):
+
+- [SAFE-T1001](../../techniques/SAFE-T1001/README.md): Tool Poisoning Attack (TPA) — sanitization strips hidden instruction patterns and unusual control characters from tool descriptions before they reach the model context.
+- [SAFE-T1102](../../techniques/SAFE-T1102/README.md): Prompt Injection (Multiple Vectors) — pattern matching at retrieved-memory and tool-description ingest points removes the most common injection markers (role-prompt phrasing, instruction-override attempts) before they enter the context.
+- [SAFE-T1401](../../techniques/SAFE-T1401/README.md): Line Jumping — content arriving at inbound emit points is filtered before it can be ordered ahead of trusted prompt segments in the context window.
+- [SAFE-T1604](../../techniques/SAFE-T1604/README.md): Multi-Modal Cross-Channel Injection — text-channel content arriving via inbound emit points is sanitized; multi-modal channels with their own preprocessing add filter coverage at their respective emit points.
+- [SAFE-T1705](../../techniques/SAFE-T1705/README.md): Cross-Server Tool Description Conflict — tool-description sanitization at load time strips embedded instructions before any conflict resolution occurs in the host.
+- [SAFE-T1910](../../techniques/SAFE-T1910/README.md): Output Format Manipulation — inbound content is filtered before it can manipulate the model into emitting attacker-controlled output formats. (Output-side validation is M-22's responsibility.)
+- [SAFE-T2105](../../techniques/SAFE-T2105/README.md): Disinformation Output — sanitizing inbound content (retrieved memory, tool descriptions) reduces the corpus of attacker-controllable inputs that could shape model output toward disinformation. Pair with M-22 for output-side validation.
+
+Three citers reference M-5 with non-matching control concepts and four cite M-5 with matching concept but expect functionality outside M-5's content-sanitization scope. See Out of scope below for the per-citer detail.
 
 ## Technical Implementation
-[TO BE COMPLETED]
+
+### Core Principles
+
+1. **Multi-stage filtering at inbound emit points only** — M-5 emits at *inbound* pipeline points only: tool-description load (pre-context) and retrieved-memory ingest (pre-context). Output-side validation (LLM outputs, tool-call argument schema) is M-22's surface, not M-5's. Each emit point has its own pattern set tuned to that surface.
+2. **Defense-in-depth, not standalone defense** — M-5 is a deterministic rule-based filter. Suspicious content that exceeds an operator-tunable suspicion-score threshold escalates to M-3 *AI-Powered Content Analysis* for model-based classification. M-1 is the ambient architectural control providing separation guarantees that M-5 operates within — M-5 does not "escalate to M-1"; M-1 always applies.
+3. **Fail-loud-not-silent** — every M-5 sanitization decision (pass / strip / block) emits a structured event to M-12 *Audit Logging* including a `confidence_score` (or per-pattern `match_count`) field so operators can empirically tune the suspicion-score threshold during the burn-in period before enabling enforcement. M-11 *Behavioral Monitoring* baselines the resulting filter-firing-rate stream over time.
+4. **Pattern set is operator-tunable with hard-floor distinction** — provide a baseline ruleset but require operators to extend it per their own threat model. Hard-floor patterns (always-block, no operator override) and tunable patterns (operator-configurable threshold, suspicions counted toward escalation) are first-class distinctions in the ruleset schema. Suppression policy follows the same governance pattern as M-11: explicit owner, expiry timestamp (default 30 days, max 90 days), audit event on create and modify, mandatory review cadence.
+5. **Regex-engine choice matters** — operator-provided patterns can cause catastrophic backtracking under adversarial input (Regular Expression Denial of Service / ReDoS). Where supported, use a linear-time engine (e.g., RE2) rather than backtracking engines (e.g., PCRE) for operator-defined patterns. If PCRE is required, validate operator-supplied patterns against a backtracking-complexity check before installing them and run a fuzzer against representative adversarial inputs in CI.
+
+### Architecture Components
+
+```text
+                 ┌──────────────────────────────────────┐
+                 │             MCP Host                 │
+                 │   (running inside M-1's data plane)  │
+                 │                                      │
+  Tool desc ───► │  M-5 emit point 1 (pre-context)      │ ──► to model context
+                 │       │                              │
+                 │       ├──→ M-12 audit event          │
+                 │       │   (with confidence_score)    │
+                 │       │                              │
+                 │       └──→ (low-confidence) M-3      │
+                 │                                      │
+  Retrieved mem► │  M-5 emit point 2 (pre-context)      │ ──► to model context
+                 │       │                              │
+                 │       ├──→ M-12 audit event          │
+                 │       │                              │
+                 │       └──→ (low-confidence) M-3      │
+                 │                                      │
+                 │  (output side: M-22, not M-5)        │
+                 └──────────────────────────────────────┘
+```
+
+The two emit points are deliberately limited to inbound surfaces. Tool-call output validation, LLM-output validation, and parameter-schema validation are M-22's territory and intentionally not in scope for M-5. Parameter sanitization (allowlisting, unused-parameter stripping, value sanitization), pre-storage memory-write hygiene, and prompt-path sanitization are also out of scope; see Limitations and Out of scope for the rationale and the partial-fit follow-up cluster.
+
+### Prerequisites
+
+- A pattern ruleset maintained as code (versioned, review-gated changes).
+- A regex engine choice — RE2 (linear-time) preferred for operator-defined patterns; if PCRE, add a pre-install backtracking-complexity check and CI fuzzer step.
+- A suppression-policy store with owner / expiry / audit guarantees.
+- [SAFE-M-12](../SAFE-M-12/README.md) *Audit Logging* deployed (M-5's emit destinations).
+- (Recommended) [SAFE-M-3](../SAFE-M-3/README.md) deployed for second-stage escalation; [SAFE-M-1](../SAFE-M-1/README.md) deployed as the ambient architectural control; [SAFE-M-4](../SAFE-M-4/README.md) deployed as a Unicode-specific pre-pass.
+
+### Implementation Steps
+
+1. **Design Phase**:
+   - Define the pattern set per emit point — separate hard-floor (always-block) from tunable (operator-configurable threshold).
+   - Define the suppression-policy schema (owner, expiry, audit-event triggers, review cadence).
+   - Define the suspicion-score threshold for M-3 escalation; plan a burn-in period during which the threshold is calibrated against M-12 audit data before enforcement.
+   - Choose the regex engine and document the ReDoS-safety strategy.
+
+2. **Development Phase**:
+   - Implement the inbound emit-point hooks in the MCP host (tool-description load, retrieved-memory ingest).
+   - Implement the pattern matcher with `confidence_score` / `match_count` emission to M-12.
+   - Implement the suppression-policy API with audit-event emission.
+   - Implement the M-3 escalation hook gated by the suspicion-score threshold.
+
+3. **Deployment Phase**:
+   - Roll out in **shadow-rule mode** first — rules fire to M-12 audit but do not block; collect per-rule firing rates and confidence-score distributions for the burn-in period (~2-4 weeks).
+   - Calibrate the suspicion-score threshold empirically from the burn-in audit data.
+   - Enable enforcement for hard-floor rules first; tunable rules after calibration.
+   - Monitor M-3 escalation queue depth continuously after enforcement; an unexpected surge usually indicates the suspicion-score threshold is too low or a rule mis-tune.
+
+## Benefits
+- **Deterministic rule-based filtering** of common injection patterns at inbound emit points — bounded behavior under linear-time engines (RE2); debuggable per-rule.
+- **Defense-in-depth complement** to M-1 (architectural separation), M-3 (model-based classification), M-4 (Unicode-specific), and M-22 (output-side validation). Each layer has different cost, coverage, and confidence tradeoffs.
+- **Operational signal** — M-5's audit stream feeds M-11 *Behavioral Monitoring* baselining and M-70 anomaly detection. Operators see filter-firing-rate trends, suppression usage, and escalation volume.
+- **Operator-tunable** — hard-floor vs tunable distinction lets operators tighten coverage per environment without losing the always-on baseline.
+
+## Limitations
+- **Insufficient alone** — pattern-based filtering is necessarily incomplete; novel attack patterns evade rule sets until added. M-5 must be paired with M-1 / M-3 / M-22 for layered defense. This is the primary honest framing the prior version of M-5 already acknowledged and this expansion preserves.
+- **False-positive cost** — overly aggressive rules block legitimate content (e.g., a tool description that legitimately discusses prompt-injection mitigations). Suppression governance addresses this but doesn't eliminate it; operators must own the tradeoff.
+- **Adversarial-evasion arms race** — attackers learn rule sets and craft evasion variants; baselines drift and need maintenance. Periodic ruleset review is mandatory operational hygiene, not optional.
+- **ReDoS risk for operator-extended patterns** — backtracking engines (PCRE-style) can be exploited via crafted input causing exponential matching time. Mitigate via RE2 (linear-time guarantee) or pre-install pattern-complexity validation plus CI fuzzing.
+- **Unicode-specific tricks are M-4's territory** — zero-width characters, RTL override, homoglyph substitution. M-5 should delegate, not duplicate. Maintaining Unicode rules in two places creates operational drift.
+- **Output sanitization is M-22's territory** — M-5 is inbound-only. Tool-call output validation, LLM-output validation, and parameter-schema validation are M-22's surface.
+- **Prompt-path filtering is out of scope** — user-prompt content is not a M-5 emit point. M-5 only fires at the two named inbound emit points (tool-description load and retrieved-memory ingest); prompt-text inspection requires a separate prompt-path sanitization control, which is tracked as a partial-fit follow-up (see Out of scope).
+- **Parameter sanitization is not in scope** — argument allowlisting, unused-parameter stripping, and parameter-value validation belong to a parameter-validation control (likely a new mitigation; see Out of scope).
+- **Memory-write/storage sanitization is not in scope** — M-5 covers retrieval-time, not write-time. Pre-storage hygiene belongs to a storage-side control (likely a new mitigation; see Out of scope).
+- **Pattern-policy version skew across MCP host instances** will silently degrade coverage; alarm on it via M-12.
+
+## Out of scope
+
+Seven of the 14 techniques that cite SAFE-M-5 are excluded from the curated Mitigates list above. Three are mislabels (cite M-5 with non-matching control concepts); four are partial-fit (cite M-5 with matching concept but expect functionality outside M-5's content-sanitization scope). Each is tracked for follow-up rather than papered over.
+
+### Mislabel cluster (3 citers — cite M-5 but want different controls)
+
+These citations name M-5 but the actual ask is for a different canonical mitigation. Tracked for redirect-in-followup; redirect targets to be chosen per-case after reading each technique's mitigation-section context.
+
+- `techniques/SAFE-T1202/README.md` cites M-5 as **"Secure Token Storage"** — wants token-storage controls (likely M-31 *Proof of Possession Tokens* or M-37 *Token Rotation and Invalidation*).
+- `techniques/SAFE-T1704/README.md` cites M-5 as **"Context Boundary Isolation"** — wants isolation/boundary controls (likely M-1 *Architectural Defense - Control/Data Flow Separation* or M-29 *Explicit Privilege Boundaries*).
+- `techniques/SAFE-T2103/README.md` cites M-5 as **"Least-Privilege Agents"** — wants privilege-boundary controls (likely M-29 *Explicit Privilege Boundaries*).
+
+### Partial-fit cluster (4 citers — cite M-5 with matching concept, but the primary ask falls outside M-5's scope)
+
+These citations correctly invoke M-5 as one of several mitigations, but each technique's primary defensive ask requires a control M-5 does not provide. The corpus likely needs new mitigations for these surfaces; a small subset may plausibly redirect to existing canonical mitigations as noted.
+
+- **`techniques/SAFE-T1302/README.md`** ("High-Privilege Tool Misuse") expects **argument allowlisting + shell-metacharacter rejection on tool parameters**. M-5 does not validate parameters. Redirect candidate: **likely a new "Parameter Validation" mitigation** — M-22 *Semantic Output Validation* covers schema validation in a related sense but does not address shell-metacharacter rejection specifically; the corpus does not currently have a parameter-validation mitigation.
+- **`techniques/SAFE-T1702/README.md`** ("Memory Retrieval Abuse") expects **pre-storage memory-write sanitization**. M-5 covers retrieval-time only. Redirect candidate: **likely a new "Memory-Write Hygiene" mitigation** — M-22 does not cover storage-side; no existing canonical maps cleanly.
+- **`techniques/SAFE-T1801/README.md`** ("Tool/Resource Exfiltration via Indirect Prompt Injection") expects **prompt-path sanitization against script-like instructions in user prompts**. M-5 has no prompt-path emit point. Redirect candidate: **plausibly M-22 if M-22 is extended to cover inbound prompt validation**, otherwise a new "Prompt-Path Sanitization" mitigation. Of the four partial-fits, this is the only one that *might* redirect cleanly to an existing canonical mitigation rather than requiring new authoring.
+- **`techniques/SAFE-T1911/README.md`** ("Parameter Exfiltration") expects **unused-parameter stripping + parameter-value sanitization**. Same gap as T1302 — M-5 does not validate parameters. Redirect candidate: **same new "Parameter Validation" mitigation** as T1302.
+
+The mislabel cluster and partial-fit cluster are tracked separately in `pr-ledger.yaml` as `safe-m-5-mislabel-cluster` and `safe-m-5-partial-fit-cluster` follow-up entries. The partial-fit cluster also signals a corpus-side gap: the canonical mitigation set may need new entries for parameter validation, memory-write hygiene, and prompt-path sanitization. That gap analysis is out of scope for this PR.
+
+## Implementation Examples
+
+### Example 1: Multi-stage Python sanitizer with M-12 audit emission
+
+This implementation is paired with the ruleset schema in Example 2 — it honors `applies_to`, the per-rule `weight` field, and the `weighted_sum` aggregator described there.
+
+```python
+import hashlib
+from dataclasses import dataclass
+from enum import Enum
+from typing import Callable
+
+class Decision(Enum):
+    PASS = "pass"     # content unchanged
+    STRIP = "strip"   # matched substrings removed; rest of content forwarded
+    BLOCK = "block"   # entire content rejected
+
+@dataclass
+class SanitizationResult:
+    decision: Decision
+    sanitized: str
+    confidence_score: float        # 0.0 (clean) to ruleset.scoring.cap (high-confidence)
+    match_count: int               # total matched substrings across all hit rules
+    matched_pattern_ids: list[str]
+
+class M5InboundSanitizer:
+    def __init__(self, ruleset, audit_emit: Callable, m3_escalate: Callable):
+        # `ruleset` is the operator-tunable schema from Example 2:
+        #   .hard_floor_patterns / .tunable_patterns (each rule has .id, .regex,
+        #   .weight, .applies_to: list[str])
+        #   .scoring (.aggregator, .cap, .suspicion_threshold)
+        #   .active_suppressions (list with .applies(rule_id, emit_point, source_id))
+        self._ruleset = ruleset
+        self._audit_emit = audit_emit
+        self._m3_escalate = m3_escalate
+
+    def _matching_rules(self, content: str, emit_point: str, patterns):
+        """Return [(rule, match_count)] for rules where applies_to includes emit_point and at least one match found."""
+        hits = []
+        for rule in patterns:
+            if emit_point not in rule.applies_to:
+                continue   # honor applies_to from Example 2 schema
+            matches = rule.regex.findall(content)
+            if matches:
+                hits.append((rule, len(matches)))
+        return hits
+
+    def _drop_suppressed(self, hits, emit_point: str, source_id: str):
+        """Filter out hits suppressed by an active operator suppression."""
+        return [(r, c) for (r, c) in hits
+                if not any(s.applies(r.id, emit_point, source_id)
+                           for s in self._ruleset.active_suppressions)]
+
+    def _score(self, hits) -> float:
+        """Aggregate per the YAML schema's `aggregator` field."""
+        agg = self._ruleset.scoring.aggregator
+        if agg == "weighted_sum":
+            raw = sum(rule.weight * count for rule, count in hits)
+        else:
+            # Other aggregators may be defined per ruleset; fail loud on unknown.
+            raise ValueError(f"unknown ruleset aggregator: {agg}")
+        return min(self._ruleset.scoring.cap, raw)
+
+    def _strip(self, content: str, hits) -> str:
+        """Remove matched substrings, preserving surrounding content."""
+        sanitized = content
+        for rule, _ in hits:
+            sanitized = rule.regex.sub('', sanitized)
+        return sanitized
+
+    def sanitize_tool_description(self, description: str, mcp_server: str, session_id: str):
+        return self._sanitize(description, "tool_description_load", mcp_server, session_id)
+
+    def sanitize_memory_chunk(self, chunk: str, source_id: str, session_id: str):
+        return self._sanitize(chunk, "memory_chunk", source_id, session_id)
+
+    def _sanitize(self, content: str, emit_point: str, source_id: str, session_id: str) -> SanitizationResult:
+        # 1. Hard-floor check — block immediately if any hard-floor rule applies-to AND matches.
+        hard_hits = self._matching_rules(content, emit_point, self._ruleset.hard_floor_patterns)
+        if hard_hits:
+            total = sum(c for _, c in hard_hits)
+            ids = [r.id for r, _ in hard_hits]
+            result = SanitizationResult(
+                decision=Decision.BLOCK, sanitized="", confidence_score=self._ruleset.scoring.cap,
+                match_count=total, matched_pattern_ids=ids)
+            self._audit_emit(emit_point=emit_point, source_id=source_id, session_id=session_id,
+                             content_sha256=_hash(content), result=result)
+            return result
+
+        # 2. Tunable scoring — apply weights from Example 2's schema, then drop suppressed hits.
+        tunable_hits_raw = self._matching_rules(content, emit_point, self._ruleset.tunable_patterns)
+        tunable_hits = self._drop_suppressed(tunable_hits_raw, emit_point, source_id)
+        score = self._score(tunable_hits)
+        match_count = sum(c for _, c in tunable_hits)
+        matched_ids = [r.id for r, _ in tunable_hits]
+        threshold = self._ruleset.scoring.suspicion_threshold
+
+        # 3. Decision branches:
+        #    - score < threshold              → PASS (unchanged content)
+        #    - threshold <= score < cap       → escalate to M-3 for second-stage classification
+        #    - score == cap                   → BLOCK (M-5 alone is confident enough)
+        if score < threshold:
+            decision, sanitized = Decision.PASS, content
+        elif score >= self._ruleset.scoring.cap:
+            decision, sanitized = Decision.BLOCK, ""
+        else:
+            m3 = self._m3_escalate(content=content, context=emit_point,
+                                   m5_score=score, m5_matched_ids=matched_ids)
+            if m3.malicious:
+                decision, sanitized = Decision.BLOCK, ""
+            elif m3.suspicious_but_legitimate:
+                # Content has injection-pattern features but M-3 says it's not adversarial
+                # (e.g., a security-research tool description that legitimately discusses
+                # prompt-injection markers). STRIP the matched substrings rather than
+                # blocking the whole content.
+                decision, sanitized = Decision.STRIP, self._strip(content, tunable_hits)
+            else:
+                decision, sanitized = Decision.PASS, content
+
+        result = SanitizationResult(
+            decision=decision, sanitized=sanitized, confidence_score=score,
+            match_count=match_count, matched_pattern_ids=matched_ids)
+        self._audit_emit(emit_point=emit_point, source_id=source_id, session_id=session_id,
+                         content_sha256=_hash(content), result=result)
+        return result
+
+def _hash(s: str) -> str:
+    return hashlib.sha256(s.encode()).hexdigest()
+```
+
+The `confidence_score` and `match_count` fields in every audit event let operators baseline the suspicion-score threshold empirically during the burn-in period before enabling enforcement. The `STRIP` decision branch is reserved for the case where M-3 classifies a M-5 hit as "suspicious but legitimate" — content with injection-pattern features that the model classifier judges non-adversarial; M-5 then removes the matched substrings rather than blocking the whole content.
+
+### Example 2: Pattern-ruleset YAML schema with operator-tunable + suppression governance
+
+```yaml
+ruleset_version: "2026.05.02"
+ruleset_owner: "platform-security@example"
+
+hard_floor_patterns:
+  - id: hard_001
+    description: "explicit instruction-override phrasing in tool descriptions"
+    regex: '(?i)(ignore|disregard) (the|all) (above|previous) (instructions?|prompts?)'
+    applies_to: [tool_description_load, memory_chunk]
+    rationale: "hardcoded prompt-injection marker; no legitimate use case in MCP content"
+
+tunable_patterns:
+  - id: tune_001
+    description: "role-prompt impersonation"
+    regex: '(?i)you are (now |an? )?(system|admin|root|superuser)'
+    weight: 0.4   # contributes to suspicion_score
+    applies_to: [tool_description_load, memory_chunk]
+  - id: tune_002
+    description: "tool-call request from data plane"
+    regex: '(?i)(call|invoke|execute) (the )?tool'
+    weight: 0.3
+    applies_to: [memory_chunk]    # less suspicious in tool descriptions
+
+scoring:
+  aggregator: "weighted_sum"      # operators may extend with custom aggregators
+  cap: 1.0
+  suspicion_threshold: 0.6        # tunable — calibrate during burn-in
+  m3_escalation: enabled          # escalate hits with score >= threshold to M-3
+
+suppressions:
+  - id: supp_2026_05_001
+    description: "github_mcp tool legitimately mentions 'system' in its description"
+    pattern_ids: [tune_001]
+    target: tool_description_load
+    target_filter:
+      mcp_server: "github-mcp-prod"
+    owner: "alice@example"
+    created_at: "2026-05-02T14:30:00Z"
+    expires_at: "2026-06-01T00:00:00Z"   # max 90 days
+    audit_required_for: [create, modify, delete, expire]
+```
+
+### Example 3: M-3 escalation hook with configurable suspicion-score threshold
+
+```python
+def m3_escalation_hook(content: str, context: str, threshold: float) -> "M3Result":
+    """When M-5 produces a low-confidence hit (multiple weak matches but no
+    hard-floor), defer the final block/pass decision to M-3's ML classifier.
+
+    The threshold is operator-tunable. Setting it too low (e.g., 0.2) will
+    flood M-3's queue with noise and degrade its latency for genuinely
+    suspicious cases; setting it too high (e.g., 0.9) loses defense-in-depth
+    benefit. Calibrate empirically against M-12 audit data during burn-in.
+
+    Production guidance: monitor M-3 escalation queue depth and the
+    M-5-fired/M-3-confirmed rate. If queue depth grows unboundedly, the
+    threshold needs tightening; if M-3 rejects most M-5 escalations, the
+    threshold may be too aggressive and is wasting M-3 cycles."""
+    return m3_classifier.classify(content=content, context=context)
+```
+
+## Testing and Validation
+
+1. **Security Testing**:
+   - Replay a known prompt-injection corpus (e.g., the AgentDojo evaluation set) at the sanitizer's input; verify hard-floor patterns fire deterministically.
+   - Replay corpus variants with adversarial encoding (Unicode tricks, leet substitution, character spacing) to measure false-negative rate; cross-check with M-4's Unicode pre-pass coverage.
+   - Replay legitimate content with prompt-injection-related text (e.g., a tool description for a security-research tool) to measure false-positive rate.
+   - Replay ReDoS-suspect patterns (catastrophic-backtracking inputs against operator-defined patterns) to confirm engine handles them in bounded time.
+
+2. **Functional Testing**:
+   - Sanitization latency per emit point under realistic load.
+   - M-12 audit event correctness — every decision produces an event with `confidence_score`, `match_count`, `matched_pattern_ids`, and the standard correlation fields.
+   - Suppression-expiry workflow — confirm expired suppressions auto-deactivate and reactivation requires owner approval with fresh audit event.
+
+3. **Integration Testing**:
+   - M-5 escalation → M-3 classification handoff under varying suspicion-score thresholds.
+   - M-5 audit → M-11 baselining alarm on filter-firing-rate spike (synthetic spike injected; alarm latency measured).
+   - Suppression-policy version skew alarm — deploy two M-5 instances with different policy versions, verify the skew alarm fires.
+
+## Deployment Considerations
+
+### Resource Requirements
+- Pattern-matching adds CPU per emit point; precise overhead depends on ruleset size and regex engine choice. Measure under expected load.
+- Ruleset memory scales with rule count; suppression-policy store scales with active suppression count plus audit-event retention.
+
+### Performance Impact
+- Latency overhead per emit point depends on ruleset depth and regex engine choice. RE2-style linear-time engines bound the worst case; PCRE-style backtracking engines may degrade unpredictably under adversarial input. No specific timing claim without measurement against your workload.
+- M-3 escalation adds latency for low-confidence hits — calibrate the suspicion-score threshold so escalation is reserved for the genuinely-suspicious tail.
+
+### Monitoring and Alerting
+- Alarm on (a) sudden drop in rule-firing rate (possible upstream issue or sanitizer bypass), (b) sudden spike (possible attack or rule mis-tune), (c) suppression-policy version skew across hosts, (d) M-3 escalation-queue depth growth (suspicion-score threshold likely too low), (e) M-3-confirmed-malicious rate vs M-5-escalated rate (calibrate the suspicion-score threshold against this ratio).
+- *Operational note*: the pattern ruleset is a high-value config. Version it as code; gate updates with M-69 *Out-of-Band Authorization* if the ruleset controls high-risk paths.
+
+## Current Status (2026)
+
+General input-validation guidance is well-established ([OWASP Input Validation Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html), [NIST SP 800-53 SI-10 (Information Input Validation)](https://csrc.nist.gov/publications/detail/sp/800-53/rev-5/final)). LLM-specific prompt-injection guidance recognizes pattern filtering as a layer in defense-in-depth, not a standalone defense ([OWASP Top 10 for Large Language Model Applications, LLM01: Prompt Injection](https://owasp.org/www-project-top-10-for-large-language-model-applications/)).
 
 ## References
 - [OWASP Input Validation Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html)
+- [NIST SP 800-53 Rev 5 — SI-10 Information Input Validation](https://csrc.nist.gov/publications/detail/sp/800-53/rev-5/final)
+- [OWASP Top 10 for Large Language Model Applications — LLM01: Prompt Injection](https://owasp.org/www-project-top-10-for-large-language-model-applications/)
+- [Model Context Protocol Specification](https://modelcontextprotocol.io/specification)
+- [Russ Cox — Regular Expression Matching Can Be Simple And Fast (2007)](https://swtch.com/~rsc/regexp/regexp1.html) — foundational article on linear-time regex matching that motivates the RE2 engine choice for operator-defined patterns.
 
 ## Related Mitigations
-- [SAFE-M-3](../SAFE-M-3/README.md): AI-Powered Content Analysis
-- [SAFE-M-4](../SAFE-M-4/README.md): Unicode Sanitization and Filtering
+- [SAFE-M-1](../SAFE-M-1/README.md): Architectural Defense - Control/Data Flow Separation — the ambient architectural control that M-5 runs inside. M-5 cannot replace M-1's separation guarantees; it reduces injection surface within them. M-5 does NOT escalate to M-1 — M-1 always applies as architecture, not as runtime decision.
+- [SAFE-M-3](../SAFE-M-3/README.md): AI-Powered Content Analysis — the ML-based second-stage classifier. M-5 produces a deterministic rule-based signal; suspicious M-5 hits (configurable suspicion-score threshold) escalate to M-3 for higher-cost model classification.
+- [SAFE-M-4](../SAFE-M-4/README.md): Unicode Sanitization and Filtering — the narrow Unicode-specific specialization. M-4 runs as a deterministic Unicode-specific pre-pass; M-5 then handles the residual general-pattern surface. Delegate, don't duplicate.
+- [SAFE-M-22](../SAFE-M-22/README.md): Semantic Output Validation — the output-side complement. M-5 = inbound lexical/pattern; M-22 = output-side semantic/schema. The two are cleanly separated and do not overlap.
+- [SAFE-M-12](../SAFE-M-12/README.md): Audit Logging — the audit substrate where M-5 sanitization decisions are recorded. M-11 *Behavioral Monitoring* baselines the resulting filter-firing-rate stream.
 
 ## Version History
 | Version | Date | Changes | Author |
 |---------|------|---------|--------|
 | 0.1 | 2025-01-03 | Initial stub | Frederick Kautz |
-| 0.2 | 2025-01-09 | Generalized from tool descriptions to all MCP content | Frederick Kautz |
\ No newline at end of file
+| 0.2 | 2025-01-09 | Generalized from tool descriptions to all MCP content | Frederick Kautz |
+| 1.0 | 2026-05-02 | Expanded stub to template parity per corpus mitigation quality audit; authored Technical Implementation (5 Core Principles including ReDoS engine-choice guidance), Architecture diagram with inbound-only emit points, Prerequisites, Implementation Steps with shadow-rule burn-in and suspicion-score calibration, Benefits, Limitations (with explicit "insufficient alone" honesty plus scope-boundary statements for output / parameters / memory-write / prompts), Implementation Examples (multi-stage Python sanitizer with M-12 confidence_score emission, pattern-ruleset YAML schema with operator-tunable + suppression governance, M-3 escalation hook with configurable threshold), Testing and Validation including ReDoS-suspect input replay, Deployment Considerations, Current Status (source-backed only); curated Mitigates list to 7 directly-mapped citers with technique-specific rationale (4 partial-fit citers and 3 mislabel citers excluded with reason notes — tracked as safe-m-5-partial-fit-cluster and safe-m-5-mislabel-cluster follow-ups); corrected NIST SP 800-53 reference to SI-10 (Information Input Validation); expanded Related Mitigations to include M-1 (ambient architectural control — not escalation target), M-12 (audit substrate), M-22 (output-side complement — non-overlapping); kept M-3 and M-4 with sharper boundary descriptions | bishnu bista |

From 1cb31c4941b33bf3efcb8e8e772aa91a82b708f5 Mon Sep 17 00:00:00 2001
From: bishnubista <bista.developer@gmail.com>
Date: Mon, 4 May 2026 13:17:40 -0700
Subject: [PATCH 2/2] fix(SAFE-M-5): correct three Mitigates labels; remove
 dangling skill-harness file reference

Cross-PR review of #201-205 surfaced two real defects in PR #204:

1. Three Mitigates entries had labels/rationales that did not match the
   citing technique's actual ask. Grep against each technique's README
   confirmed the mismatch:

   - T1604 cites M-5 to "filter error responses to prevent stack trace
     and version disclosure" - that is outbound error-response filtering,
     but M-5 is inbound-only (Limitations: "Output sanitization is M-22's
     territory"). Concept matches; surface is wrong. Moved to partial-fit
     cluster.
   - T1705 cites M-5 to "filter agent communication content for injection
     patterns" - inter-agent message filtering. M-5's documented inbound
     emit points are tool-description load and retrieved-memory ingest;
     agent-communication is neither. Moved to partial-fit cluster.
   - T1910 cites M-5 for "JSON schema enforcement, character-set regex,
     length limits" - parameter / argument schema validation, which M-5
     explicitly excludes (Limitations: "Parameter sanitization is not in
     scope"). Different control concept. Moved to mislabel cluster as the
     same parameter-validation gap surfaced by T1302/T1911.

   Curated Mitigates list shrinks 7 -> 4 (T1001, T1102, T1401, T2105).
   Out of scope grows 7 -> 10 (4 mislabels + 6 partial-fits). Counts
   updated consistently across the Mitigates intro, Mitigates body trail,
   Out of scope intro, and both cluster headings.

2. Replaced an in-text reference to an external skill-harness tracking
   file (which does not exist in the safe-mcp repository and was a
   dangling pointer for upstream readers) with prose that names the
   follow-up clusters without claiming an in-repo file.

Surfaces two new corpus-side candidate mitigations as side-effects of
the relocations: an output-side error-sanitization mitigation for T1604,
and an agent-communication-filtering mitigation for T1705.

Signed-off-by: bishnubista <bista.developer@gmail.com>
---
 mitigations/SAFE-M-5/README.md | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/mitigations/SAFE-M-5/README.md b/mitigations/SAFE-M-5/README.md
index 0e16954f..0f5d8927 100644
--- a/mitigations/SAFE-M-5/README.md
+++ b/mitigations/SAFE-M-5/README.md
@@ -14,17 +14,14 @@ This mitigation is intentionally a layer in defense-in-depth, not a standalone d
 
 ## Mitigates
 
-The mitigation directly addresses the following techniques (curated against the actual citation graph; 3 mislabels and 4 partial-fit citers excluded — see Out of scope):
+The mitigation directly addresses the following techniques (curated against the actual citation graph; 4 mislabels and 6 partial-fit citers excluded — see Out of scope):
 
 - [SAFE-T1001](../../techniques/SAFE-T1001/README.md): Tool Poisoning Attack (TPA) — sanitization strips hidden instruction patterns and unusual control characters from tool descriptions before they reach the model context.
 - [SAFE-T1102](../../techniques/SAFE-T1102/README.md): Prompt Injection (Multiple Vectors) — pattern matching at retrieved-memory and tool-description ingest points removes the most common injection markers (role-prompt phrasing, instruction-override attempts) before they enter the context.
 - [SAFE-T1401](../../techniques/SAFE-T1401/README.md): Line Jumping — content arriving at inbound emit points is filtered before it can be ordered ahead of trusted prompt segments in the context window.
-- [SAFE-T1604](../../techniques/SAFE-T1604/README.md): Multi-Modal Cross-Channel Injection — text-channel content arriving via inbound emit points is sanitized; multi-modal channels with their own preprocessing add filter coverage at their respective emit points.
-- [SAFE-T1705](../../techniques/SAFE-T1705/README.md): Cross-Server Tool Description Conflict — tool-description sanitization at load time strips embedded instructions before any conflict resolution occurs in the host.
-- [SAFE-T1910](../../techniques/SAFE-T1910/README.md): Output Format Manipulation — inbound content is filtered before it can manipulate the model into emitting attacker-controlled output formats. (Output-side validation is M-22's responsibility.)
 - [SAFE-T2105](../../techniques/SAFE-T2105/README.md): Disinformation Output — sanitizing inbound content (retrieved memory, tool descriptions) reduces the corpus of attacker-controllable inputs that could shape model output toward disinformation. Pair with M-22 for output-side validation.
 
-Three citers reference M-5 with non-matching control concepts and four cite M-5 with matching concept but expect functionality outside M-5's content-sanitization scope. See Out of scope below for the per-citer detail.
+Four citers reference M-5 with non-matching control concepts and six cite M-5 with matching concept but expect functionality outside M-5's content-sanitization scope. See Out of scope below for the per-citer detail.
 
 ## Technical Implementation
 
@@ -110,26 +107,29 @@ The two emit points are deliberately limited to inbound surfaces. Tool-call outp
 
 ## Out of scope
 
-Seven of the 14 techniques that cite SAFE-M-5 are excluded from the curated Mitigates list above. Three are mislabels (cite M-5 with non-matching control concepts); four are partial-fit (cite M-5 with matching concept but expect functionality outside M-5's content-sanitization scope). Each is tracked for follow-up rather than papered over.
+Ten of the 14 techniques that cite SAFE-M-5 are excluded from the curated Mitigates list above. Four are mislabels (cite M-5 with non-matching control concepts); six are partial-fit (cite M-5 with matching concept but expect functionality outside M-5's content-sanitization scope). Each is tracked for follow-up rather than papered over.
 
-### Mislabel cluster (3 citers — cite M-5 but want different controls)
+### Mislabel cluster (4 citers — cite M-5 but want different controls)
 
 These citations name M-5 but the actual ask is for a different canonical mitigation. Tracked for redirect-in-followup; redirect targets to be chosen per-case after reading each technique's mitigation-section context.
 
 - `techniques/SAFE-T1202/README.md` cites M-5 as **"Secure Token Storage"** — wants token-storage controls (likely M-31 *Proof of Possession Tokens* or M-37 *Token Rotation and Invalidation*).
 - `techniques/SAFE-T1704/README.md` cites M-5 as **"Context Boundary Isolation"** — wants isolation/boundary controls (likely M-1 *Architectural Defense - Control/Data Flow Separation* or M-29 *Explicit Privilege Boundaries*).
+- `techniques/SAFE-T1910/README.md` cites M-5 as "Content Sanitization" but the body asks for **strict JSON schemas, regex over allowed character sets, and length limits on tool-call inputs** — i.e., parameter / argument schema validation. M-5 is content-pattern sanitization at inbound emit points, not per-call schema enforcement; the surface is wrong and the control concept (schema validation) is different. Same corpus-side gap as T1302/T1911 below — redirect candidate is the same proposed new "Parameter Validation" canonical mitigation.
 - `techniques/SAFE-T2103/README.md` cites M-5 as **"Least-Privilege Agents"** — wants privilege-boundary controls (likely M-29 *Explicit Privilege Boundaries*).
 
-### Partial-fit cluster (4 citers — cite M-5 with matching concept, but the primary ask falls outside M-5's scope)
+### Partial-fit cluster (6 citers — cite M-5 with matching concept, but the primary ask falls outside M-5's scope)
 
 These citations correctly invoke M-5 as one of several mitigations, but each technique's primary defensive ask requires a control M-5 does not provide. The corpus likely needs new mitigations for these surfaces; a small subset may plausibly redirect to existing canonical mitigations as noted.
 
 - **`techniques/SAFE-T1302/README.md`** ("High-Privilege Tool Misuse") expects **argument allowlisting + shell-metacharacter rejection on tool parameters**. M-5 does not validate parameters. Redirect candidate: **likely a new "Parameter Validation" mitigation** — M-22 *Semantic Output Validation* covers schema validation in a related sense but does not address shell-metacharacter rejection specifically; the corpus does not currently have a parameter-validation mitigation.
+- **`techniques/SAFE-T1604/README.md`** ("Multi-Modal Cross-Channel Injection" — cites M-5 in its preventive controls list) expects **filtering of *outbound* error responses to suppress stack-trace and version disclosure**. M-5 is inbound-only (tool-description load + retrieved-memory ingest); outbound error-response filtering is M-22's territory or, more precisely, an output-side error-redaction control adjacent to M-22's surface. Redirect candidate: **M-22 *Semantic Output Validation*** if extended to cover error-response redaction; otherwise a narrower output-side error-sanitization mitigation.
 - **`techniques/SAFE-T1702/README.md`** ("Memory Retrieval Abuse") expects **pre-storage memory-write sanitization**. M-5 covers retrieval-time only. Redirect candidate: **likely a new "Memory-Write Hygiene" mitigation** — M-22 does not cover storage-side; no existing canonical maps cleanly.
-- **`techniques/SAFE-T1801/README.md`** ("Tool/Resource Exfiltration via Indirect Prompt Injection") expects **prompt-path sanitization against script-like instructions in user prompts**. M-5 has no prompt-path emit point. Redirect candidate: **plausibly M-22 if M-22 is extended to cover inbound prompt validation**, otherwise a new "Prompt-Path Sanitization" mitigation. Of the four partial-fits, this is the only one that *might* redirect cleanly to an existing canonical mitigation rather than requiring new authoring.
+- **`techniques/SAFE-T1705/README.md`** ("Cross-Server Tool Description Conflict" — cites M-5 to "filter agent communication content for injection patterns and suspicious instructions") expects **inter-agent message filtering**. M-5's two documented inbound emit points are tool-description load and retrieved-memory ingest; agent-to-agent communication content is neither and would require either a third emit point or a separate inter-agent-message-filtering mitigation. Concept matches; surface is undocumented in M-5. Redirect candidate: **a new agent-communication-filtering mitigation OR an extension of M-5 to add an `agent_communication` emit point** — corpus-design conversation needed before deciding.
+- **`techniques/SAFE-T1801/README.md`** ("Tool/Resource Exfiltration via Indirect Prompt Injection") expects **prompt-path sanitization against script-like instructions in user prompts**. M-5 has no prompt-path emit point. Redirect candidate: **plausibly M-22 if M-22 is extended to cover inbound prompt validation**, otherwise a new "Prompt-Path Sanitization" mitigation. Of the original four partial-fits, this is the only one that *might* redirect cleanly to an existing canonical mitigation rather than requiring new authoring.
 - **`techniques/SAFE-T1911/README.md`** ("Parameter Exfiltration") expects **unused-parameter stripping + parameter-value sanitization**. Same gap as T1302 — M-5 does not validate parameters. Redirect candidate: **same new "Parameter Validation" mitigation** as T1302.
 
-The mislabel cluster and partial-fit cluster are tracked separately in `pr-ledger.yaml` as `safe-m-5-mislabel-cluster` and `safe-m-5-partial-fit-cluster` follow-up entries. The partial-fit cluster also signals a corpus-side gap: the canonical mitigation set may need new entries for parameter validation, memory-write hygiene, and prompt-path sanitization. That gap analysis is out of scope for this PR.
+Each cluster is tracked as a follow-up audit task. The mislabel cluster needs per-citer redirect-target decisions (verified against the canonical mitigation set per case). The partial-fit cluster also signals a corpus-side gap: the canonical mitigation set may need new entries for parameter validation, memory-write hygiene, prompt-path sanitization, and possibly agent-communication filtering and an output-side error-sanitization mitigation. Both follow-ups are out of scope for this PR.
 
 ## Implementation Examples
 
@@ -382,3 +382,4 @@ General input-validation guidance is well-established ([OWASP Input Validation C
 | 0.1 | 2025-01-03 | Initial stub | Frederick Kautz |
 | 0.2 | 2025-01-09 | Generalized from tool descriptions to all MCP content | Frederick Kautz |
 | 1.0 | 2026-05-02 | Expanded stub to template parity per corpus mitigation quality audit; authored Technical Implementation (5 Core Principles including ReDoS engine-choice guidance), Architecture diagram with inbound-only emit points, Prerequisites, Implementation Steps with shadow-rule burn-in and suspicion-score calibration, Benefits, Limitations (with explicit "insufficient alone" honesty plus scope-boundary statements for output / parameters / memory-write / prompts), Implementation Examples (multi-stage Python sanitizer with M-12 confidence_score emission, pattern-ruleset YAML schema with operator-tunable + suppression governance, M-3 escalation hook with configurable threshold), Testing and Validation including ReDoS-suspect input replay, Deployment Considerations, Current Status (source-backed only); curated Mitigates list to 7 directly-mapped citers with technique-specific rationale (4 partial-fit citers and 3 mislabel citers excluded with reason notes — tracked as safe-m-5-partial-fit-cluster and safe-m-5-mislabel-cluster follow-ups); corrected NIST SP 800-53 reference to SI-10 (Information Input Validation); expanded Related Mitigations to include M-1 (ambient architectural control — not escalation target), M-12 (audit substrate), M-22 (output-side complement — non-overlapping); kept M-3 and M-4 with sharper boundary descriptions | bishnu bista |
+| 1.1 | 2026-05-04 | Corrected three Mitigates entries that did not match the citing technique's actual ask: T1604 (cites M-5 to "filter error responses" — outbound error-response filtering, but M-5 is inbound-only) moved to partial-fit cluster; T1705 (cites M-5 to "filter agent communication content" — agent-communication is not one of M-5's two documented inbound emit points) moved to partial-fit cluster; T1910 (cites M-5 for "JSON schema enforcement, character-set regex, length limits" — that is parameter / argument schema validation, not content-pattern sanitization) moved to mislabel cluster as the same parameter-validation gap surfaced by T1302/T1911. Curated Mitigates list shrinks 7 → 4 (T1001, T1102, T1401, T2105 retained); Out of scope grows 7 → 10 (4 mislabels + 6 partial-fits). Replaced an in-text reference to an external skill-harness tracking file (which does not exist in the safe-mcp repository and was a dangling pointer for upstream readers) with prose that names the follow-up clusters without claiming an in-repo file. Surfaces new corpus-side candidate mitigations (an output-side error-sanitization mitigation for T1604; an agent-communication-filtering mitigation for T1705) in addition to the v1.0-known parameter-validation, memory-write-hygiene, and prompt-path-sanitization gaps. | bishnu bista |