Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,10 @@ datasets/*/raw/
datasets/*/with-injected-pii/

# Per-paper raw run artifacts (cert chains, intermediate JSONL — only summaries are checked in)
papers/*/raw-results/
papers/*/raw-results/*
# But keep the scaffold so the directory exists in a fresh clone.
!papers/*/raw-results/.gitignore
!papers/*/raw-results/.gitkeep

# Internal PRD/planning anchors (kept locally for fresh-context resumes)
specs/
32 changes: 30 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ Empirical methodology code for the Lucairn Research Program — a per-industry s
## What this repo is NOT

- Not a Lucairn product. The Lucairn platform itself lives elsewhere (gateway, sanitizer, witness, certificate verifier).
- Not a customer-deployment artifact. These are vendor-published methodology papers; the publisher and the methodology are named in full. No customer attribution. No testimonials. No interviewed users.
- Not a customer-deployment artifact. These are vendor-published methodology papers; the publisher and the methodology are named in full. No customer attribution. No persona-driven narrative. No attributed endorsement quotes.
- Not a CLI or a publishable npm package. It is a methodology codebase, run from a clone.
- Not a "case study". The artifact frame is a vendor benchmark / methodology paper; the word "case study" does not appear in any paper title, route slug, social card, or meta description.
- Not a customer-implementation report. The artifact frame is a vendor benchmark / methodology paper; persona-driven or implementation-report framing does not appear in any paper title, route slug, social card, or meta description.
- Not legal advice. Regulatory references are factual citations to primary sources (EUR-Lex Regulation 2024/1689; HHS HIPAA Safe Harbor enumeration; published clinical-NLP de-identification literature); they are not interpretations.

## Regulatory context
Expand Down Expand Up @@ -56,6 +56,34 @@ Prerequisites:
- pnpm 10.x
- Kaggle CLI installed (`pipx install kaggle`) with a working `~/.kaggle/kaggle.json` API token

### Slice 2 — Harness (mock-only)

Slice 2 adds an in-process harness that calls the Lucairn gateway row-by-row via `POST /api/v1/proxy/messages` in `mode: "proving_ground"`, collects each row's signed cert URL, and computes per-HIPAA-category recall against the Measurement-B ground truth.

**The harness is currently mock-only.** The live `gateway.lucairn.eu` run lands in Slice 3 per the locked PRD halt gate (avoid Anthropic upstream cost on every iteration). Run the in-process smoke flow:

```bash
# Step 1 — call the mock gateway over 5 rows; write the raw NDJSON.
pnpm run pipeline -- --rows=5 --mock --output=/tmp/slice2-smoke.ndjson

# Step 2 — convert NDJSON to the CERTIFICATES.csv appendix shape.
pnpm run collect-certs -- --input=/tmp/slice2-smoke.ndjson --output=/tmp/slice2-CERTIFICATES.csv

# Step 3 — compute recall / precision / F1, validate against the SUMMARY schema.
pnpm run compute-recall \
-- --truth=datasets/healthcare/with-injected-pii/ground-truth.jsonl \
--redactions-source=mock \
--rows=5 \
--output=/tmp/slice2-SUMMARY.json
```

Mock options exercise the math layer against a known oracle:

- `--miss-rate=0.3` — mock drops 30% of injected entities so recall and F1 reflect the configuration.
- `--spurious-fp-count=2` — mock emits 2 synthetic false-positive redactions per row.

The harness implementation reads `LUCAIRN_GATEWAY_URL` and `LUCAIRN_API_KEY` from the environment but Slice 2 supports `--mock` only; the `--live` flag is reserved for Slice 3 and refuses to run without the explicit invocation that the live-run halt gate authorises.

## Methodology summary (Paper 1)

The healthcare dataset (MTSamples) is **not institutionally de-identified**; it is raw clinical narrative from the public mtsamples.com archive (CC0 public domain). Paper 1 therefore reports two empirically distinct measurements:
Expand Down
8 changes: 4 additions & 4 deletions datasets/healthcare/RECIPE.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,14 +41,14 @@ Because MTSamples has no published ground-truth PHI annotations, a single measur

This recipe documents the *full* methodology for Paper 1. The implementation lands incrementally:

- **Slice 1 (current commit) — ships:**
- **Slice 1 — shipped:**
- Dataset acquisition script (`scripts/download-mtsamples.ts`)
- Deterministic synthetic PII re-injection for Measurement B's 500-row subset (`scripts/inject-pii.ts`, `src/inject-pii-core.ts`)
- Round-trip verification (`scripts/verify-injection.ts`)
- **Slice 2 — pending:** harness to call the Lucairn gateway row-by-row, collect cert URLs, compute recall against Measurement B's known ground truth (`scripts/run-pipeline.ts`, `scripts/collect-certs.ts`, `scripts/compute-recall.ts`)
- **Slice 3 — pending:** full Paper 1 run including **Measurement A's raw-corpus detection pass** (Lucairn over the full ~5k MTSamples corpus, reporting detection counts without ground truth) plus the Measurement B recall numbers + the `papers/paper-1-healthcare/CERTIFICATES.csv` cert-URL appendix
- **Slice 2 (current commit) — shipped (mock-only):** harness to call the Lucairn gateway row-by-row via `POST /api/v1/proxy/messages` in `mode: "proving_ground"`, collect cert URLs, compute recall against Measurement B's known ground truth (`scripts/run-pipeline.ts`, `scripts/collect-certs.ts`, `scripts/compute-recall.ts`, `src/gateway-client.ts`, `src/redaction-extractor.ts`, `src/recall.ts`, `src/hipaa-category-mapping.ts`, `src/mocks/gateway-fixtures.ts`). The live gateway run is deferred to Slice 3.
- **Slice 3 — pending:** full Paper 1 run including **Measurement A's raw-corpus detection pass** (Lucairn over the full ~5k MTSamples corpus, reporting detection counts without ground truth) plus the Measurement B recall numbers against the live gateway + the `papers/paper-1-healthcare/CERTIFICATES.csv` cert-URL appendix

Until Slice 2 + Slice 3 land, the harness + Measurement A code does not exist in this repo. The methodology description below is the published target, not the current shipped state.
Until Slice 3 lands, the live-gateway end-to-end run + Measurement A code does not exist in this repo. The methodology description below is the published target, not the current shipped state.

### Measurement A — raw-corpus detection (what does Lucairn flag in the wild?)

Expand Down
6 changes: 5 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,15 @@
"test:watch": "vitest",
"dataset:download": "node --import tsx scripts/download-mtsamples.ts",
"dataset:inject-pii": "node --import tsx scripts/inject-pii.ts",
"dataset:verify-injection": "node --import tsx scripts/verify-injection.ts"
"dataset:verify-injection": "node --import tsx scripts/verify-injection.ts",
"pipeline": "node --import tsx scripts/run-pipeline.ts",
"collect-certs": "node --import tsx scripts/collect-certs.ts",
"compute-recall": "node --import tsx scripts/compute-recall.ts"
},
"devDependencies": {
"@faker-js/faker": "^9.0.0",
"@types/node": "^20.11.0",
"msw": "^2.14.6",
"tsx": "^4.22.0",
"typescript": "^5.4.0",
"vitest": "^1.6.0"
Expand Down
126 changes: 126 additions & 0 deletions papers/_template/SUMMARY.schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://github.com/Declade/lucairn-research/papers/_template/SUMMARY.schema.json",
"title": "Lucairn Research Program — per-paper SUMMARY.json",
"description": "Aggregate recall / precision / F1 numbers per HIPAA Safe Harbor category + overall + per-row breakdown for any paper in the Lucairn Research Program. Mirrors the RecallSummary shape produced by src/recall.ts. Recall numbers are produced by the gateway's compareGroundTruth function at services/gateway/internal/api/ground_truth.go:69-138 in the dual-sandbox-architecture repo (case-insensitive bidirectional value-containment with whitespace normalization, server-side; not span-exact overlap). The publisher (Lucairn) ships this matcher in production; the research repo aggregates its verdicts.",
"type": "object",
"required": [
"schema_version",
"generator",
"overall",
"per_category",
"per_row",
"notes"
],
"additionalProperties": false,
"properties": {
"schema_version": {
"type": "string",
"const": "1.0"
},
"generator": {
"type": "string",
"const": "lucairn-research/recall.ts"
},
"overall": {
"$ref": "#/$defs/OverallCounts"
},
"per_category": {
"type": "array",
"items": {
"type": "object",
"required": ["category", "counts"],
"additionalProperties": false,
"properties": {
"category": {
"$ref": "#/$defs/HipaaCategory"
},
"counts": {
"$ref": "#/$defs/CategoryCounts"
}
}
},
"minItems": 18,
"maxItems": 18
},
"per_row": {
"type": "array",
"items": {
"type": "object",
"required": ["row_index", "tp", "fp", "fn", "recall"],
"additionalProperties": false,
"properties": {
"row_index": { "type": "integer", "minimum": 0 },
"tp": { "type": "integer", "minimum": 0 },
"fp": { "type": "integer", "minimum": 0 },
"fn": { "type": "integer", "minimum": 0 },
"recall": { "type": "number", "minimum": 0, "maximum": 1 }
}
}
},
"notes": {
"type": "array",
"items": { "type": "string" }
}
},
"$defs": {
"HipaaCategory": {
"type": "string",
"enum": [
"NAME",
"GEO_SUBDIVISION",
"DATE",
"PHONE",
"FAX",
"EMAIL",
"SSN",
"MRN",
"HEALTH_PLAN_ID",
"ACCOUNT_NUMBER",
"LICENSE_NUMBER",
"VEHICLE_ID",
"DEVICE_ID",
"URL",
"IP_ADDRESS",
"BIOMETRIC_ID",
"FACE_PHOTO_REF",
"OTHER_UNIQUE_ID"
]
},
"CategoryCounts": {
"type": "object",
"required": ["tp", "fp", "fn", "precision", "recall", "f1"],
"additionalProperties": false,
"properties": {
"tp": { "type": "integer", "minimum": 0 },
"fp": { "type": "integer", "minimum": 0 },
"fn": { "type": "integer", "minimum": 0 },
"precision": { "type": "number", "minimum": 0, "maximum": 1 },
"recall": { "type": "number", "minimum": 0, "maximum": 1 },
"f1": { "type": "number", "minimum": 0, "maximum": 1 }
}
},
"OverallCounts": {
"type": "object",
"required": [
"tp",
"fp",
"fn",
"total_annotations",
"precision",
"recall",
"f1"
],
"additionalProperties": false,
"properties": {
"tp": { "type": "integer", "minimum": 0 },
"fp": { "type": "integer", "minimum": 0 },
"fn": { "type": "integer", "minimum": 0 },
"total_annotations": { "type": "integer", "minimum": 0 },
"precision": { "type": "number", "minimum": 0, "maximum": 1 },
"recall": { "type": "number", "minimum": 0, "maximum": 1 },
"f1": { "type": "number", "minimum": 0, "maximum": 1 }
}
}
}
}
3 changes: 3 additions & 0 deletions papers/paper-1-healthcare/raw-results/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*
!.gitignore
!.gitkeep
Empty file.
Loading
Loading