From b7db7b7fec7b8d03417edb629434d531a97b43ae Mon Sep 17 00:00:00 2001 From: Aku Nikkola Date: Thu, 28 May 2026 12:55:40 +0300 Subject: [PATCH 1/2] docs: surface FI in nodejs-v2 sub-README and npm description MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The main README and SUPPORTED_ENTITIES already reflect FI_HETU and FI_BUSINESS_ID (added in #4 / v2.2.0), but two surfaces still listed "EU/UK/US patterns" and "33 entity types": - `nodejs-v2/README.md` — the sub-README rendered on the package path - `nodejs-v2/package.json` description — what shows up on the npm registry page Update both lists in place (same wording style): bump 33 → 35 and extend `EU/UK/US` → `EU/UK/US/FI`. No semver/runtime impact. --- nodejs-v2/README.md | 2 +- nodejs-v2/package.json | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/nodejs-v2/README.md b/nodejs-v2/README.md index 2a5f2cd..ea6f572 100644 --- a/nodejs-v2/README.md +++ b/nodejs-v2/README.md @@ -1,6 +1,6 @@ # pii-shield -> Anonymize PII in legal documents locally. Node.js CLI — 33 entity types via GLiNER NER + EU/UK/US patterns. Reads `.pdf` / `.docx` / `.txt`. Pure offline, no Python. +> Anonymize PII in legal documents locally. Node.js CLI — 35 entity types via GLiNER NER + EU/UK/US/FI patterns. Reads `.pdf` / `.docx` / `.txt`. Pure offline, no Python. [![npm](https://img.shields.io/npm/v/pii-shield.svg?style=flat-square)](https://www.npmjs.com/package/pii-shield) [![License](https://img.shields.io/badge/license-MIT-blue.svg?style=flat-square)](https://github.com/gregmos/PII-Shield/blob/main/LICENSE) [![Node](https://img.shields.io/badge/node-22%2B-339933.svg?style=flat-square&logo=nodedotjs&logoColor=white)](https://nodejs.org/) diff --git a/nodejs-v2/package.json b/nodejs-v2/package.json index 86e9503..77fcec3 100644 --- a/nodejs-v2/package.json +++ b/nodejs-v2/package.json @@ -2,7 +2,7 @@ "name": "pii-shield", "version": "2.2.0", "type": "module", - "description": "Anonymize PII in legal documents locally — Node.js CLI (GLiNER NER, EU/UK/US patterns, .docx/.pdf/.txt, HITL review). MCP plugin distributed separately as a .mcpb.", + "description": "Anonymize PII in legal documents locally — Node.js CLI (GLiNER NER, EU/UK/US/FI patterns, .docx/.pdf/.txt, HITL review). MCP plugin distributed separately as a .mcpb.", "keywords": [ "pii", "anonymization", From b8d806273162594a5b29338f82cf454535b61b7a Mon Sep 17 00:00:00 2001 From: gregmos Date: Mon, 15 Jun 2026 12:31:21 +0300 Subject: [PATCH 2/2] docs: surface FI in remaining coverage lists and fix entity counts MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit #5 updated the nodejs-v2 README blurb and the npm description, but three "Detected entity types" surfaces were left stale at 33 / 29 pattern-based and omitted FI entirely: - nodejs-v2/README.md "What it detects" — 33->35, +31 pattern, +FI in country list - nodejs-v2/cli/USAGE.md — 33->35 types, +FI_HETU/FI_BUSINESS_ID - README.md (root) — 33->35 (4 NER + 31 pattern), +FI_HETU/FI_BUSINESS_ID Counts verified against SUPPORTED_ENTITIES in nodejs-v2/src/engine/entity-types.ts (35 = 4 NER + 31 pattern). Co-Authored-By: Claude Opus 4.8 (1M context) --- README.md | 4 ++-- nodejs-v2/README.md | 4 ++-- nodejs-v2/cli/USAGE.md | 4 ++-- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index e1b7876..f0901ff 100644 --- a/README.md +++ b/README.md @@ -245,9 +245,9 @@ Authoritative list is `nodejs-v2/src/engine/entity-types.ts` (`SUPPORTED_ENTITIE `EU_VAT`, `EU_PASSPORT` **Country-specific**: -`DE_TAX_ID`, `DE_SOCIAL_SECURITY`, `FR_NIR`, `FR_CNI`, `IT_FISCAL_CODE`, `IT_VAT`, `ES_DNI`, `ES_NIE`, `CY_TIC`, `CY_ID_CARD` +`DE_TAX_ID`, `DE_SOCIAL_SECURITY`, `FR_NIR`, `FR_CNI`, `IT_FISCAL_CODE`, `IT_VAT`, `ES_DNI`, `ES_NIE`, `CY_TIC`, `CY_ID_CARD`, `FI_HETU`, `FI_BUSINESS_ID` -33 types total (4 NER + 29 pattern-based). +35 types total (4 NER + 31 pattern-based). ## Logs diff --git a/nodejs-v2/README.md b/nodejs-v2/README.md index ea6f572..386c2b5 100644 --- a/nodejs-v2/README.md +++ b/nodejs-v2/README.md @@ -55,13 +55,13 @@ See `pii-shield --help ` or the [full CLI manual](https://github.com/gr ## What it detects -33 entity types — 4 NER classes (`PERSON`, `ORGANIZATION`, `LOCATION`, `NRP`) plus 29 pattern-based recognizers: +35 entity types — 4 NER classes (`PERSON`, `ORGANIZATION`, `LOCATION`, `NRP`) plus 31 pattern-based recognizers: - **Generic**: email, phone, URL, IP, ID doc, credit card, IBAN, crypto, medical licence - **US**: SSN, passport, driver licence - **UK**: NIN, NHS, passport, CRN, driving licence - **EU-wide**: VAT, passport -- **Country-specific**: DE (tax ID, social security), FR (NIR, CNI), IT (fiscal code, VAT), ES (DNI, NIE), CY (TIC, ID card) +- **Country-specific**: DE (tax ID, social security), FR (NIR, CNI), IT (fiscal code, VAT), ES (DNI, NIE), CY (TIC, ID card), FI (henkilötunnus, Y-tunnus) Authoritative list: [`src/engine/entity-types.ts`](https://github.com/gregmos/PII-Shield/blob/main/nodejs-v2/src/engine/entity-types.ts). diff --git a/nodejs-v2/cli/USAGE.md b/nodejs-v2/cli/USAGE.md index ec07a7b..2d736e5 100644 --- a/nodejs-v2/cli/USAGE.md +++ b/nodejs-v2/cli/USAGE.md @@ -959,7 +959,7 @@ tail -f ~/.pii_shield/audit/ner_init.log # NER bootstrap detail ## Detected entity types -33 types in total. The full authoritative list is `nodejs-v2/src/engine/entity-types.ts` (`SUPPORTED_ENTITIES`). +35 types in total. The full authoritative list is `nodejs-v2/src/engine/entity-types.ts` (`SUPPORTED_ENTITIES`). ### NER-based (GLiNER zero-shot) @@ -983,7 +983,7 @@ tail -f ~/.pii_shield/audit/ner_init.log # NER bootstrap detail ### Country-specific -`DE_TAX_ID`, `DE_SOCIAL_SECURITY`, `FR_NIR`, `FR_CNI`, `IT_FISCAL_CODE`, `IT_VAT`, `ES_DNI`, `ES_NIE`, `CY_TIC`, `CY_ID_CARD`. +`DE_TAX_ID`, `DE_SOCIAL_SECURITY`, `FR_NIR`, `FR_CNI`, `IT_FISCAL_CODE`, `IT_VAT`, `ES_DNI`, `ES_NIE`, `CY_TIC`, `CY_ID_CARD`, `FI_HETU`, `FI_BUSINESS_ID`. To list at runtime in JSON: `pii-shield scan small.txt --json | jq '.entities[].type' | sort -u` (assuming sample file has at least one of each).