Skip to content

ui-insight/AI4RA-UDM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

106 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI4RA Unified Data Model (UDM)

A universal data model for research administration. The UDM provides a common schema that any institution can adopt to standardize how research administration data is structured, described, and shared — regardless of what systems they use internally.

Mission

Research administration data is fragmented across institutions, locked in proprietary systems with inconsistent naming, structures, and definitions. The AI4RA UDM aims to:

  • Standardize the language and structure of research administration data across institutions
  • Be generic enough to accommodate the diverse needs of universities, research institutes, and funding agencies
  • Enable interoperability between systems by providing a shared framework that institutions map their local data to
  • Support FAIR data principles — making research administration data Findable, Accessible, Interoperable, and Reusable

The UDM is a specification, not a database. It defines what tables, columns, relationships, and constraints should exist. Institutions implement it in whatever database technology fits their environment and map their local data to the common model.

How the UDM Is Defined

Single Source of Truth

The complete UDM is defined in a single file: udm_schema.json. This JSON file contains every table, column, data type, constraint, foreign key relationship, description, synonym, and PII flag in the model. It is the authoritative definition from which all other representations (dashboard, API endpoints, documentation) are derived.

Domain Organization

The model's 40 tables are organized across the major domains of research administration:

Domain Tables Purpose
Reference Organization, AllowedValues, BudgetCategory Shared lookup and reference data
Core Personnel, ContactDetails, Project People, contact info, and research projects
Pre-Award RFA, RFARequirement, Proposal, ProposalBudget, ProposalChecklistItem Funding opportunities, requirements tracking, proposal development, and per-proposal preparation checklists
Submission SubmissionProfile, SubmissionPackage, SubmissionAttachment, SubmissionAttempt, SubmissionEvent Sponsor-system submission packaging, transmission, and audit trail
Post-Award Award, Modification, Terms, AwardBudgetPeriod, AwardBudget, Subaward, CostShare, AwardDeliverable Grant/contract management after funding
Financial Fund, Account, FinanceCode, Transaction, IndirectRate, Invoice Accounting, transactions, and cost tracking
Personnel & Effort ProjectRole, Effort Roles on projects and effort certification
Faculty Development ProjectCohort, CohortParticipation Cohort programs, mentoring, and participant tracking
Operations ApplicationSystem, ServiceRequest System catalog and service request tracking
Compliance ComplianceRequirement, ConflictOfInterest IRB, IACUC, COI, and regulatory tracking
System Document, ActivityLog Document management and audit trails

The model also includes 8 pre-built views (e.g., vw_Active_Awards, vw_Budget_Comparison, vw_Overdue_Deliverables) as reference query implementations that institutions can adopt or adapt for dashboards and reporting.

Browse the full model interactively at the UDM Dashboard.

Naming Conventions (Ontology)

The UDM follows consistent, predictable naming patterns:

  • Tables: PascalCase — ProjectRole, AwardBudgetPeriod, ComplianceRequirement
  • Columns: Snake_case — Award_Number, Start_Date, Is_Active
  • Primary keys: TableName_IDPersonnel_ID, Award_ID, Organization_ID
  • Foreign keys: Named by role, not generically — Sponsor_Organization_ID, Lead_Organization_ID, Subrecipient_Organization_ID (not just Organization_ID)
  • Standard suffixes: _ID, _Date, _Status, _Type, _Amount, _Percent, _Number, _Name, _Description
  • Booleans: Prefixed with Is_Is_Active, Is_Primary, Is_Key_Personnel

For full ontology documentation, see the Ontology page.

Design Patterns

Flexible vs. Fixed Enumerations: The UDM distinguishes between values that vary by institution and values that are universal standards:

  • AllowedValues table — for institution-specific lookups (contact types, project roles, fund types, deliverable types, etc.) that institutions customize to their needs
  • CHECK constraints — for universal standards (GAAP account types, federal rate structures, status workflows) that should remain consistent everywhere

See allowedvalues.md for complete documentation of this pattern.

Other patterns: Self-referencing hierarchies (Organization → Parent Organization, Project → Parent Project), referential integrity with CASCADE/SET NULL behaviors, and audit trail support on critical tables.

For a detailed explanation of every table, naming convention, and design decision, see the Ontology vignette.

JSON Format

The udm_schema.json structure:

{
  "tables": {
    "Award": {
      "description": "Funded grants and contracts...",
      "synonyms": "Grant, Contract, Agreement",
      "columns": {
        "Award_ID": {
          "type": "VARCHAR(50)",
          "primary_key": true,
          "required": false,
          "description": "Primary key for award record",
          "synonyms": "Grant ID, Contract ID"
        },
        "Sponsor_Organization_ID": {
          "type": "VARCHAR(50)",
          "references": { "table": "Organization", "column": "Organization_ID" },
          "description": "Organization providing funding"
        }
      }
    }
  },
  "views": {
    "vw_Active_Awards": {
      "description": "Summary of active awards...",
      "sql": "SELECT ..."
    }
  },
  "table_count": 40,
  "view_count": 8,
  "relationship_count": 100
}

Accessing the UDM

The UDM is served as static JSON via GitHub Pages. These endpoints define the framework — the structure and conventions of the data model. They do not contain populated data; institutions implement the model and populate it with their own data.

Endpoint Description
/data/udm_schema.json Complete schema (SSOT) — tables, columns, types, constraints, descriptions, synonyms, PII flags, views
/data/data-dictionary.json Human-readable descriptions, synonyms, and PII flags
/data/relationships.json Foreign key relationships

Most consumers only need the primary endpoint (udm_schema.json) — it contains everything.

// Fetch the full UDM schema
const response = await fetch('https://ui-insight.github.io/AI4RA-UDM/data/udm_schema.json');
const udm = await response.json();

// Browse tables
Object.keys(udm.tables);  // ["AllowedValues", "Account", "Award", ...]

// Get a table's columns and descriptions
udm.tables.Award.columns;

// Find all foreign key relationships
for (const [table, data] of Object.entries(udm.tables)) {
  for (const [col, def] of Object.entries(data.columns)) {
    if (def.references) {
      console.log(`${table}.${col}${def.references.table}.${def.references.column}`);
    }
  }
}

Implementing the UDM

The UDM is database-agnostic. Institutions can implement it with whatever technology fits their stack:

Technology Notes
MySQL / MariaDB Generate CREATE TABLE statements from udm_schema.json table and column definitions
PostgreSQL Same approach; adapt types as needed (VARCHAR works as-is, adjust date functions)
SQLite Good for lightweight/embedded deployments; adapt constraints to SQLite syntax
SQL Server Adjust type names (VARCHARNVARCHAR, date functions)
MongoDB / NoSQL Use the JSON schema directly as collection definitions; embed related documents where appropriate instead of FK joins
Data Warehouse (Snowflake, BigQuery, Redshift) Use as a staging/canonical layer; adapt types to platform-specific variants

Use udm_schema.json as the reference for generating DDL or collection definitions for your target platform. The JSON contains all table structures, column types, constraints, and relationships needed to create a complete implementation.

Institutions are expected to:

  1. Map their local field names to UDM column names (the synonyms field helps identify equivalent concepts)
  2. Populate the AllowedValues table with their institution-specific lookup values
  3. Adapt views to their reporting needs

Crosswalks

A crosswalk is a declarative mapping between a source system's vocabulary and the UDM's — one row per source field, listing the target UDM column, any value-translation rules, and transformation notes. Crosswalks are the concrete artifact that operationalizes step 1 above.

Source Column UDM Column Value Translation
grantNumber Award.Award_ID direct
pi_email ContactDetails.Contact_Value (with Contact_Type = "Email") pivot
STATUS_CD = 'A' Award.Award_Status = 'Active' enum lookup
proj_start (MM/DD/YYYY) Proposal.Proposed_Start_Date parse to DATE

The UDM supports crosswalk authoring in two places:

  • synonyms on every table and column — lists alternate names (e.g., AwardGrant, Contract, Agreement) so matchers can identify equivalent concepts without a hand-built dictionary.
  • description — plain-language column purpose that ML or LLM matchers can use alongside the column name to disambiguate near-duplicates.

In a medallion lakehouse, the crosswalk is the Silver layer: each source gets its own Silver schema that renames/pivots/coerces its raw Bronze data into UDM-shaped columns. The Gold layer then unions the Silver views across sources.

See the Infrastructure tab for diagrams of both the Silver crosswalk layer and the surrounding medallion architecture.

Contributing

The UDM improves through community input. There are several ways to participate:

  • Suggest changes or report issues: Open a GitHub Issue describing the table, column, or convention you'd like to add, change, or discuss
  • Join the discussion: Use GitHub Discussions for broader questions about the model's direction, new domain coverage, or adoption experiences

When udm_schema.json is updated on main, CI automatically regenerates the dashboard data files served via GitHub Pages.

Entity Relationship Diagram

graph TD

    Account-->Account
    Account-->Transaction
    AllowedValues-->AwardDeliverable
    AllowedValues-->ConflictOfInterest
    AllowedValues-->ContactDetails
    AllowedValues-->Document
    AllowedValues-->FinanceCode
    AllowedValues-->Fund
    AllowedValues-->Modification
    AllowedValues-->Project
    AllowedValues-->ProjectRole
    AllowedValues-->Transaction
    ApplicationSystem-->ServiceRequest
    Award-->AwardBudget
    Award-->AwardBudgetPeriod
    Award-->AwardDeliverable
    Award-->ConflictOfInterest
    Award-->CostShare
    Award-->FinanceCode
    Award-->Invoice
    Award-->Modification
    Award-->ProjectRole
    Award-->ServiceRequest
    Award-->Subaward
    Award-->Terms
    Award-->Transaction
    AwardBudgetPeriod-->AwardBudget
    AwardBudgetPeriod-->AwardDeliverable
    AwardBudgetPeriod-->Invoice
    AwardBudgetPeriod-->Transaction
    BudgetCategory-->AwardBudget
    BudgetCategory-->ProposalBudget
    FinanceCode-->Transaction
    Fund-->Transaction
    IndirectRate-->ProposalBudget
    Organization-->ApplicationSystem
    Organization-->Award
    Organization-->ContactDetails
    Organization-->CostShare
    Organization-->FinanceCode
    Organization-->Fund
    Organization-->IndirectRate
    Organization-->Organization
    Organization-->Personnel
    Organization-->Project
    Organization-->Proposal
    Organization-->RFA
    Organization-->Subaward
    Personnel-->AwardDeliverable
    Personnel-->CohortParticipation
    Personnel-->ComplianceRequirement
    Personnel-->ConflictOfInterest
    Personnel-->ContactDetails
    Personnel-->Effort
    Personnel-->Modification
    Personnel-->ProjectRole
    Personnel-->ServiceRequest
    Personnel-->Subaward
    Personnel-->Transaction
    Project-->Award
    Project-->CohortParticipation
    Project-->ComplianceRequirement
    Project-->ConflictOfInterest
    Project-->Project
    Project-->ProjectCohort
    Project-->ProjectRole
    Project-->Proposal
    Project-->Transaction
    ProjectCohort-->CohortParticipation
    ProjectRole-->Effort
    Proposal-->Award
    Proposal-->CohortParticipation
    Proposal-->Proposal
    Proposal-->ProposalBudget
    Proposal-->ServiceRequest
    RFA-->Award
    RFA-->Proposal
    RFA-->RFARequirement
Loading

About

A standardized data model for research administration that provides universal terms and definitions, helping institutions map their local data to a common schema for dashboards and analytics.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors