Skip to content

Add column-level PII flag and data classification to catalog/*.json #9

@ProfessorPolymorphic

Description

@ProfessorPolymorphic

Context

The institutional Data Governance Explorer at ui-insight/AISPEG renders the catalogs published by this repo for both engineers (need to connect to our data) and stakeholders (need definitions and business rules). The current catalog schema is sufficient for table-level navigation, but it omits two pieces of column-level metadata that both audiences ask for:

  • PII flag — does this column contain personally identifiable information?
  • Data classification — Public / Internal / Sensitive / Restricted (or whichever institutional ladder this repo formalizes).

OpenERA has these fields locally in its DataDictionary database table; UCM Daily Register hard-codes them in TypeScript inside its frontend. Neither pattern is portable. Anyone consuming the canonical catalog for cross-portfolio governance — the AISPEG explorer, automated drift checks, or stakeholder briefings — currently has to either skip the badges or re-derive them from per-app code.

Proposal

Extend the column object in catalog/*.json (and the equivalent field object in entity-shape catalogs like processmapping.json / stratplan.json) with two optional fields:

{
  "name": "Email",
  "type": "String(255)",
  "nullable": true,
  "primary_key": false,
  "foreign_key": null,
  "pii": true,
  "classification": "Sensitive"
}

Suggested constraints:

  • pii: boolean, optional (absence ≠ false; absence means "not yet classified" so reviewers can spot gaps).
  • classification: enum string from a small controlled vocabulary documented in docs/standard/. Suggested initial values: Public, Internal, Sensitive, Restricted — matching the institutional convention OpenERA already uses.

Why both (and not just PII)

PII is a special case of classification. Surfacing both makes the more common stakeholder question ("which columns can I show in this dashboard?") answerable without grepping the column name. Two fields, not one, because not all classified data is PII (e.g., financial sponsor terms are often Sensitive but not PII).

Migration path

  1. Add the field schema to docs/standard/.
  2. Backfill openera.json from the existing DataDictionary table seed (this is the highest-coverage source already and the canonical research-admin domain).
  3. Backfill ucm-daily-register.json from the hard-coded TS in UCMDailyRegister-App/frontend/src/pages/DataGovernancePage.tsx.
  4. Mark unset columns as needing classification (no false defaults).
  5. Update scripts/check_governance_drift.py to flag columns missing classification once the schema is published.

Downstream impact

Once landed, ui-insight/AISPEG#55 (currently merged without the badges) gets a small extension to render PII/classification badges on the Data Model project-detail and table-detail pages. The lib/governance/canonical-udm-tables.ts v1 hand-curated tagging in AISPEG can also retire if the catalog adds a top-level is_canonical_udm flag at the table level — but that's a separate proposal; this issue is column-level only.

Filed from the AISPEG side as part of the Data Governance Explorer epic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions