Context
The institutional Data Governance Explorer at ui-insight/AISPEG renders the catalogs published by this repo for both engineers (need to connect to our data) and stakeholders (need definitions and business rules). The current catalog schema is sufficient for table-level navigation, but it omits two pieces of column-level metadata that both audiences ask for:
- PII flag — does this column contain personally identifiable information?
- Data classification — Public / Internal / Sensitive / Restricted (or whichever institutional ladder this repo formalizes).
OpenERA has these fields locally in its DataDictionary database table; UCM Daily Register hard-codes them in TypeScript inside its frontend. Neither pattern is portable. Anyone consuming the canonical catalog for cross-portfolio governance — the AISPEG explorer, automated drift checks, or stakeholder briefings — currently has to either skip the badges or re-derive them from per-app code.
Proposal
Extend the column object in catalog/*.json (and the equivalent field object in entity-shape catalogs like processmapping.json / stratplan.json) with two optional fields:
{
"name": "Email",
"type": "String(255)",
"nullable": true,
"primary_key": false,
"foreign_key": null,
"pii": true,
"classification": "Sensitive"
}
Suggested constraints:
pii: boolean, optional (absence ≠ false; absence means "not yet classified" so reviewers can spot gaps).
classification: enum string from a small controlled vocabulary documented in docs/standard/. Suggested initial values: Public, Internal, Sensitive, Restricted — matching the institutional convention OpenERA already uses.
Why both (and not just PII)
PII is a special case of classification. Surfacing both makes the more common stakeholder question ("which columns can I show in this dashboard?") answerable without grepping the column name. Two fields, not one, because not all classified data is PII (e.g., financial sponsor terms are often Sensitive but not PII).
Migration path
- Add the field schema to
docs/standard/.
- Backfill
openera.json from the existing DataDictionary table seed (this is the highest-coverage source already and the canonical research-admin domain).
- Backfill
ucm-daily-register.json from the hard-coded TS in UCMDailyRegister-App/frontend/src/pages/DataGovernancePage.tsx.
- Mark unset columns as needing classification (no false defaults).
- Update
scripts/check_governance_drift.py to flag columns missing classification once the schema is published.
Downstream impact
Once landed, ui-insight/AISPEG#55 (currently merged without the badges) gets a small extension to render PII/classification badges on the Data Model project-detail and table-detail pages. The lib/governance/canonical-udm-tables.ts v1 hand-curated tagging in AISPEG can also retire if the catalog adds a top-level is_canonical_udm flag at the table level — but that's a separate proposal; this issue is column-level only.
Filed from the AISPEG side as part of the Data Governance Explorer epic.
Context
The institutional Data Governance Explorer at
ui-insight/AISPEGrenders the catalogs published by this repo for both engineers (need to connect to our data) and stakeholders (need definitions and business rules). The current catalog schema is sufficient for table-level navigation, but it omits two pieces of column-level metadata that both audiences ask for:OpenERA has these fields locally in its
DataDictionarydatabase table; UCM Daily Register hard-codes them in TypeScript inside its frontend. Neither pattern is portable. Anyone consuming the canonical catalog for cross-portfolio governance — the AISPEG explorer, automated drift checks, or stakeholder briefings — currently has to either skip the badges or re-derive them from per-app code.Proposal
Extend the column object in
catalog/*.json(and the equivalent field object in entity-shape catalogs likeprocessmapping.json/stratplan.json) with two optional fields:{ "name": "Email", "type": "String(255)", "nullable": true, "primary_key": false, "foreign_key": null, "pii": true, "classification": "Sensitive" }Suggested constraints:
pii: boolean, optional (absence ≠ false; absence means "not yet classified" so reviewers can spot gaps).classification: enum string from a small controlled vocabulary documented indocs/standard/. Suggested initial values:Public,Internal,Sensitive,Restricted— matching the institutional convention OpenERA already uses.Why both (and not just PII)
PII is a special case of classification. Surfacing both makes the more common stakeholder question ("which columns can I show in this dashboard?") answerable without grepping the column name. Two fields, not one, because not all classified data is PII (e.g., financial sponsor terms are often Sensitive but not PII).
Migration path
docs/standard/.openera.jsonfrom the existingDataDictionarytable seed (this is the highest-coverage source already and the canonical research-admin domain).ucm-daily-register.jsonfrom the hard-coded TS inUCMDailyRegister-App/frontend/src/pages/DataGovernancePage.tsx.scripts/check_governance_drift.pyto flag columns missing classification once the schema is published.Downstream impact
Once landed,
ui-insight/AISPEG#55(currently merged without the badges) gets a small extension to render PII/classification badges on the Data Model project-detail and table-detail pages. Thelib/governance/canonical-udm-tables.tsv1 hand-curated tagging in AISPEG can also retire if the catalog adds a top-levelis_canonical_udmflag at the table level — but that's a separate proposal; this issue is column-level only.Filed from the AISPEG side as part of the Data Governance Explorer epic.