Declade · Declade · May 17, 2026 · May 17, 2026 · May 17, 2026 · May 17, 2026
diff --git a/.gitignore b/.gitignore
@@ -15,7 +15,10 @@ datasets/*/raw/
 datasets/*/with-injected-pii/
 
 # Per-paper raw run artifacts (cert chains, intermediate JSONL — only summaries are checked in)
-papers/*/raw-results/
+papers/*/raw-results/*
+# But keep the scaffold so the directory exists in a fresh clone.
+!papers/*/raw-results/.gitignore
+!papers/*/raw-results/.gitkeep
 
 # Internal PRD/planning anchors (kept locally for fresh-context resumes)
 specs/
diff --git a/README.md b/README.md
@@ -12,9 +12,9 @@ Empirical methodology code for the Lucairn Research Program — a per-industry s
 ## What this repo is NOT
 
 - Not a Lucairn product. The Lucairn platform itself lives elsewhere (gateway, sanitizer, witness, certificate verifier).
-- Not a customer-deployment artifact. These are vendor-published methodology papers; the publisher and the methodology are named in full. No customer attribution. No testimonials. No interviewed users.
+- Not a customer-deployment artifact. These are vendor-published methodology papers; the publisher and the methodology are named in full. No customer attribution. No persona-driven narrative. No attributed endorsement quotes.
 - Not a CLI or a publishable npm package. It is a methodology codebase, run from a clone.
-- Not a "case study". The artifact frame is a vendor benchmark / methodology paper; the word "case study" does not appear in any paper title, route slug, social card, or meta description.
+- Not a customer-implementation report. The artifact frame is a vendor benchmark / methodology paper; persona-driven or implementation-report framing does not appear in any paper title, route slug, social card, or meta description.
 - Not legal advice. Regulatory references are factual citations to primary sources (EUR-Lex Regulation 2024/1689; HHS HIPAA Safe Harbor enumeration; published clinical-NLP de-identification literature); they are not interpretations.
 
 ## Regulatory context
@@ -56,6 +56,34 @@ Prerequisites:
 - pnpm 10.x
 - Kaggle CLI installed (`pipx install kaggle`) with a working `~/.kaggle/kaggle.json` API token
 
+### Slice 2 — Harness (mock-only)
+
+Slice 2 adds an in-process harness that calls the Lucairn gateway row-by-row via `POST /api/v1/proxy/messages` in `mode: "proving_ground"`, collects each row's signed cert URL, and computes per-HIPAA-category recall against the Measurement-B ground truth.
+
+**The harness is currently mock-only.** The live `gateway.lucairn.eu` run lands in Slice 3 per the locked PRD halt gate (avoid Anthropic upstream cost on every iteration). Run the in-process smoke flow:
+
+```bash
+# Step 1 — call the mock gateway over 5 rows; write the raw NDJSON.
+pnpm run pipeline -- --rows=5 --mock --output=/tmp/slice2-smoke.ndjson
+
+# Step 2 — convert NDJSON to the CERTIFICATES.csv appendix shape.
+pnpm run collect-certs -- --input=/tmp/slice2-smoke.ndjson --output=/tmp/slice2-CERTIFICATES.csv
+
+# Step 3 — compute recall / precision / F1, validate against the SUMMARY schema.
+pnpm run compute-recall \
+  -- --truth=datasets/healthcare/with-injected-pii/ground-truth.jsonl \
+  --redactions-source=mock \
+  --rows=5 \
+  --output=/tmp/slice2-SUMMARY.json
+```
+
+Mock options exercise the math layer against a known oracle:
+
+- `--miss-rate=0.3` — mock drops 30% of injected entities so recall and F1 reflect the configuration.
+- `--spurious-fp-count=2` — mock emits 2 synthetic false-positive redactions per row.
+
+The harness implementation reads `LUCAIRN_GATEWAY_URL` and `LUCAIRN_API_KEY` from the environment but Slice 2 supports `--mock` only; the `--live` flag is reserved for Slice 3 and refuses to run without the explicit invocation that the live-run halt gate authorises.
+
 ## Methodology summary (Paper 1)
 
 The healthcare dataset (MTSamples) is **not institutionally de-identified**; it is raw clinical narrative from the public mtsamples.com archive (CC0 public domain). Paper 1 therefore reports two empirically distinct measurements:

diff --git a/datasets/healthcare/RECIPE.md b/datasets/healthcare/RECIPE.md
@@ -41,14 +41,14 @@ Because MTSamples has no published ground-truth PHI annotations, a single measur
 
 This recipe documents the *full* methodology for Paper 1. The implementation lands incrementally:
 
-- **Slice 1 (current commit) — ships:**
+- **Slice 1 — shipped:**
   - Dataset acquisition script (`scripts/download-mtsamples.ts`)
   - Deterministic synthetic PII re-injection for Measurement B's 500-row subset (`scripts/inject-pii.ts`, `src/inject-pii-core.ts`)
   - Round-trip verification (`scripts/verify-injection.ts`)
-- **Slice 2 — pending:** harness to call the Lucairn gateway row-by-row, collect cert URLs, compute recall against Measurement B's known ground truth (`scripts/run-pipeline.ts`, `scripts/collect-certs.ts`, `scripts/compute-recall.ts`)
-- **Slice 3 — pending:** full Paper 1 run including **Measurement A's raw-corpus detection pass** (Lucairn over the full ~5k MTSamples corpus, reporting detection counts without ground truth) plus the Measurement B recall numbers + the `papers/paper-1-healthcare/CERTIFICATES.csv` cert-URL appendix
+- **Slice 2 (current commit) — shipped (mock-only):** harness to call the Lucairn gateway row-by-row via `POST /api/v1/proxy/messages` in `mode: "proving_ground"`, collect cert URLs, compute recall against Measurement B's known ground truth (`scripts/run-pipeline.ts`, `scripts/collect-certs.ts`, `scripts/compute-recall.ts`, `src/gateway-client.ts`, `src/redaction-extractor.ts`, `src/recall.ts`, `src/hipaa-category-mapping.ts`, `src/mocks/gateway-fixtures.ts`). The live gateway run is deferred to Slice 3.
+- **Slice 3 — pending:** full Paper 1 run including **Measurement A's raw-corpus detection pass** (Lucairn over the full ~5k MTSamples corpus, reporting detection counts without ground truth) plus the Measurement B recall numbers against the live gateway + the `papers/paper-1-healthcare/CERTIFICATES.csv` cert-URL appendix
 
-Until Slice 2 + Slice 3 land, the harness + Measurement A code does not exist in this repo. The methodology description below is the published target, not the current shipped state.
+Until Slice 3 lands, the live-gateway end-to-end run + Measurement A code does not exist in this repo. The methodology description below is the published target, not the current shipped state.
 
 ### Measurement A — raw-corpus detection (what does Lucairn flag in the wild?)
 

diff --git a/package.json b/package.json
@@ -20,11 +20,15 @@
     "test:watch": "vitest",
     "dataset:download": "node --import tsx scripts/download-mtsamples.ts",
     "dataset:inject-pii": "node --import tsx scripts/inject-pii.ts",
-    "dataset:verify-injection": "node --import tsx scripts/verify-injection.ts"
+    "dataset:verify-injection": "node --import tsx scripts/verify-injection.ts",
+    "pipeline": "node --import tsx scripts/run-pipeline.ts",
+    "collect-certs": "node --import tsx scripts/collect-certs.ts",
+    "compute-recall": "node --import tsx scripts/compute-recall.ts"
   },
   "devDependencies": {
     "@faker-js/faker": "^9.0.0",
     "@types/node": "^20.11.0",
+    "msw": "^2.14.6",
     "tsx": "^4.22.0",
     "typescript": "^5.4.0",
     "vitest": "^1.6.0"

diff --git a/papers/_template/SUMMARY.schema.json b/papers/_template/SUMMARY.schema.json
@@ -0,0 +1,126 @@
+{
+  "$schema": "https://json-schema.org/draft/2020-12/schema",
+  "$id": "https://github.com/Declade/lucairn-research/papers/_template/SUMMARY.schema.json",
+  "title": "Lucairn Research Program — per-paper SUMMARY.json",
+  "description": "Aggregate recall / precision / F1 numbers per HIPAA Safe Harbor category + overall + per-row breakdown for any paper in the Lucairn Research Program. Mirrors the RecallSummary shape produced by src/recall.ts. Recall numbers are produced by the gateway's compareGroundTruth function at services/gateway/internal/api/ground_truth.go:69-138 in the dual-sandbox-architecture repo (case-insensitive bidirectional value-containment with whitespace normalization, server-side; not span-exact overlap). The publisher (Lucairn) ships this matcher in production; the research repo aggregates its verdicts.",
+  "type": "object",
+  "required": [
+    "schema_version",
+    "generator",
+    "overall",
+    "per_category",
+    "per_row",
+    "notes"
+  ],
+  "additionalProperties": false,
+  "properties": {
+    "schema_version": {
+      "type": "string",
+      "const": "1.0"
+    },
+    "generator": {
+      "type": "string",
+      "const": "lucairn-research/recall.ts"
+    },
+    "overall": {
+      "$ref": "#/$defs/OverallCounts"
+    },
+    "per_category": {
+      "type": "array",
+      "items": {
+        "type": "object",
+        "required": ["category", "counts"],
+        "additionalProperties": false,
+        "properties": {
+          "category": {
+            "$ref": "#/$defs/HipaaCategory"
+          },
+          "counts": {
+            "$ref": "#/$defs/CategoryCounts"
+          }
+        }
+      },
+      "minItems": 18,
+      "maxItems": 18
+    },
+    "per_row": {
+      "type": "array",
+      "items": {
+        "type": "object",
+        "required": ["row_index", "tp", "fp", "fn", "recall"],
+        "additionalProperties": false,
+        "properties": {
+          "row_index": { "type": "integer", "minimum": 0 },
+          "tp": { "type": "integer", "minimum": 0 },
+          "fp": { "type": "integer", "minimum": 0 },
+          "fn": { "type": "integer", "minimum": 0 },
+          "recall": { "type": "number", "minimum": 0, "maximum": 1 }
+        }
+      }
+    },
+    "notes": {
+      "type": "array",
+      "items": { "type": "string" }
+    }
+  },
+  "$defs": {
+    "HipaaCategory": {
+      "type": "string",
+      "enum": [
+        "NAME",
+        "GEO_SUBDIVISION",
+        "DATE",
+        "PHONE",
+        "FAX",
+        "EMAIL",
+        "SSN",
+        "MRN",
+        "HEALTH_PLAN_ID",
+        "ACCOUNT_NUMBER",
+        "LICENSE_NUMBER",
+        "VEHICLE_ID",
+        "DEVICE_ID",
+        "URL",
+        "IP_ADDRESS",
+        "BIOMETRIC_ID",
+        "FACE_PHOTO_REF",
+        "OTHER_UNIQUE_ID"
+      ]
+    },
+    "CategoryCounts": {
+      "type": "object",
+      "required": ["tp", "fp", "fn", "precision", "recall", "f1"],
+      "additionalProperties": false,
+      "properties": {
+        "tp": { "type": "integer", "minimum": 0 },
+        "fp": { "type": "integer", "minimum": 0 },
+        "fn": { "type": "integer", "minimum": 0 },
+        "precision": { "type": "number", "minimum": 0, "maximum": 1 },
+        "recall": { "type": "number", "minimum": 0, "maximum": 1 },
+        "f1": { "type": "number", "minimum": 0, "maximum": 1 }
+      }
+    },
+    "OverallCounts": {
+      "type": "object",
+      "required": [
+        "tp",
+        "fp",
+        "fn",
+        "total_annotations",
+        "precision",
+        "recall",
+        "f1"
+      ],
+      "additionalProperties": false,
+      "properties": {
+        "tp": { "type": "integer", "minimum": 0 },
+        "fp": { "type": "integer", "minimum": 0 },
+        "fn": { "type": "integer", "minimum": 0 },
+        "total_annotations": { "type": "integer", "minimum": 0 },
+        "precision": { "type": "number", "minimum": 0, "maximum": 1 },
+        "recall": { "type": "number", "minimum": 0, "maximum": 1 },
+        "f1": { "type": "number", "minimum": 0, "maximum": 1 }
+      }
+    }
+  }
+}
diff --git a/papers/paper-1-healthcare/raw-results/.gitignore b/papers/paper-1-healthcare/raw-results/.gitignore
@@ -0,0 +1,3 @@
+*
+!.gitignore
+!.gitkeep
diff --git a/papers/paper-1-healthcare/raw-results/.gitkeep b/papers/paper-1-healthcare/raw-results/.gitkeep