Skip to content

Commit 0bcd5e1

Browse files
hiskudinclaude
andcommitted
fix: address review feedback on defender skill
- Align metadata.version to "2.0" (matches other skills) - Fix PromptDefense config: tier2 → tier2Config (matches actual API) - Fix ToolResultSanitizer example to match real config shape - Add illustrative disclaimer to Important section - Fix Tier 2 verification: check result.tier instead of score value - Add missing import in Pattern 2 reference example - Add try/catch to Express middleware example - Use threshold param in batch evaluation example Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 6395586 commit 0bcd5e1

2 files changed

Lines changed: 35 additions & 29 deletions

File tree

skills/stackone-defender/SKILL.md

Lines changed: 12 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ license: MIT
55
compatibility: Requires Node.js 18+. Optional peer dependencies @huggingface/transformers and onnxruntime-node for Tier 2 ML classification.
66
metadata:
77
author: stackone
8-
version: "1.0"
8+
version: "2.0"
99
---
1010

1111
# StackOne Defender
@@ -18,7 +18,7 @@ StackOne Defender is a local-first prompt injection and jailbreak detection libr
1818
https://www.npmjs.com/package/@stackone/defender
1919
```
2020

21-
Do not guess configuration options. Verify against the published package.
21+
Code examples below are illustrative — verify class names and config keys against the published package README before use. Do not guess configuration options.
2222

2323
## Instructions
2424

@@ -55,7 +55,7 @@ Without these, Defender falls back to Tier 1 pattern matching only.
5555
import { PromptDefense } from "@stackone/defender";
5656

5757
const defense = new PromptDefense({
58-
tier2: { mode: "onnx" }, // default — uses ONNX ML model
58+
tier2Config: { mode: "onnx" }, // default — uses ONNX ML model
5959
});
6060

6161
const result = await defense.scan("What is the capital of France?");
@@ -90,11 +90,9 @@ The `scan()` method returns:
9090
```typescript
9191
const defense = new PromptDefense({
9292
// Tier 1: pattern matching
93-
tier1: {
94-
enabled: true, // default: true
95-
},
93+
enableTier1: true, // default: true
9694
// Tier 2: ML classification
97-
tier2: {
95+
tier2Config: {
9896
mode: "onnx", // "onnx" (default) or "mlp"
9997
threshold: 0.5, // score above this = blocked (default: 0.5)
10098
},
@@ -113,19 +111,19 @@ When building agents, tool results from external APIs can contain injected conte
113111
```typescript
114112
import { ToolResultSanitizer } from "@stackone/defender";
115113

116-
const sanitizer = new ToolResultSanitizer({
117-
tier2Config: { mode: "onnx" },
118-
});
114+
const sanitizer = new ToolResultSanitizer();
119115

120116
const toolOutput = await externalApi.getData();
121-
const sanitized = await sanitizer.scan(JSON.stringify(toolOutput));
117+
const sanitized = await sanitizer.sanitize(toolOutput, "tool-name");
122118

123-
if (!sanitized.allowed) {
119+
if (sanitized.riskLevel === "high" || sanitized.riskLevel === "critical") {
124120
console.warn("Tool result contains suspicious content:", sanitized);
125121
// Handle: skip, flag, or redact the result
126122
}
127123
```
128124

125+
> **Note**: `ToolResultSanitizer` has its own configuration shape — fetch the npm README for full options. The examples above are illustrative.
126+
129127
## Examples
130128

131129
### Example 1: User wants to quickly test if a string is safe
@@ -160,7 +158,7 @@ Actions:
160158
```typescript
161159
import { PromptDefense } from "@stackone/defender";
162160

163-
const defense = new PromptDefense({ tier2: { mode: "onnx" } });
161+
const defense = new PromptDefense({ tier2Config: { mode: "onnx" } });
164162

165163
const dataset = [
166164
{ text: "What is 2+2?", expected: true },
@@ -196,7 +194,7 @@ Result: Root cause identified with actionable fix (threshold adjustment or text
196194
### Tier 2 not working / falling back to Tier 1 only
197195
**Cause**: Missing optional peer dependencies.
198196
- Install: `npm install @huggingface/transformers onnxruntime-node`
199-
- Verify: check that `result.score` returns a non-zero value (Tier 1 only returns 0 or 1)
197+
- Verify: run a scan on a benign string and confirm `result.tier` is `null` (not `"tier1"`) — this confirms the ML model loaded
200198

201199
### High false positive rate
202200
**Cause**: Threshold too low for the use case.

skills/stackone-defender/references/integration-patterns.md

Lines changed: 23 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Scan all tool outputs before they enter the LLM context window. This is the most
99
```typescript
1010
import { PromptDefense } from "@stackone/defender";
1111

12-
const defense = new PromptDefense({ tier2: { mode: "onnx" } });
12+
const defense = new PromptDefense({ tier2Config: { mode: "onnx" } });
1313

1414
async function safeToolCall(toolName: string, args: any): Promise<string> {
1515
const rawResult = await executeTool(toolName, args);
@@ -28,6 +28,10 @@ async function safeToolCall(toolName: string, args: any): Promise<string> {
2828
Scan user messages before processing. Catches direct prompt injection attempts.
2929

3030
```typescript
31+
import { PromptDefense } from "@stackone/defender";
32+
33+
const defense = new PromptDefense({ tier2Config: { mode: "onnx" } });
34+
3135
async function handleUserMessage(message: string) {
3236
const scan = await defense.scan(message);
3337

@@ -47,21 +51,25 @@ Add Defender as HTTP middleware to protect API endpoints that accept free-text i
4751
```typescript
4852
import { PromptDefense } from "@stackone/defender";
4953

50-
const defense = new PromptDefense({ tier2: { mode: "onnx" } });
54+
const defense = new PromptDefense({ tier2Config: { mode: "onnx" } });
5155

5256
// Express middleware
5357
async function defenderMiddleware(req, res, next) {
54-
const text = req.body?.message || req.body?.input || req.body?.prompt;
55-
if (!text) return next();
56-
57-
const scan = await defense.scan(text);
58-
if (!scan.allowed) {
59-
return res.status(400).json({
60-
error: "Input rejected",
61-
reason: `Detected by ${scan.tier} (score: ${scan.score.toFixed(2)})`,
62-
});
58+
try {
59+
const text = req.body?.message || req.body?.input || req.body?.prompt;
60+
if (!text) return next();
61+
62+
const scan = await defense.scan(text);
63+
if (!scan.allowed) {
64+
return res.status(400).json({
65+
error: "Input rejected",
66+
reason: `Detected by ${scan.tier} (score: ${scan.score.toFixed(2)})`,
67+
});
68+
}
69+
next();
70+
} catch (err) {
71+
next(err);
6372
}
64-
next();
6573
}
6674

6775
app.post("/api/chat", defenderMiddleware, chatHandler);
@@ -74,7 +82,7 @@ Evaluate Defender against a labeled dataset to measure detection quality.
7482
```typescript
7583
import { PromptDefense } from "@stackone/defender";
7684

77-
const defense = new PromptDefense({ tier2: { mode: "onnx" } });
85+
const defense = new PromptDefense({ tier2Config: { mode: "onnx" } });
7886

7987
interface Sample {
8088
text: string;
@@ -86,7 +94,7 @@ async function evaluate(samples: Sample[], threshold = 0.5) {
8694

8795
for (const { text, label } of samples) {
8896
const result = await defense.scan(text);
89-
const predicted = !result.allowed;
97+
const predicted = result.score >= threshold;
9098
const actual = label === "malicious";
9199

92100
if (predicted && actual) tp++;
@@ -111,7 +119,7 @@ The ONNX model loads on first inference. Pre-warm at startup to avoid cold-start
111119
```typescript
112120
import { PromptDefense } from "@stackone/defender";
113121

114-
const defense = new PromptDefense({ tier2: { mode: "onnx" } });
122+
const defense = new PromptDefense({ tier2Config: { mode: "onnx" } });
115123

116124
// Pre-warm at application startup
117125
await defense.scan("warmup");

0 commit comments

Comments
 (0)