Skip to content

feat: default and customisable schema definitions for OMOP CDM#3

Open
nicoloesch wants to merge 1 commit into
mainfrom
schema_support
Open

feat: default and customisable schema definitions for OMOP CDM#3
nicoloesch wants to merge 1 commit into
mainfrom
schema_support

Conversation

@nicoloesch
Copy link
Copy Markdown
Collaborator

Schema Configuration for OMOP CDM Tables

Summary

This PR implements environment-variable-backed schema configuration for OMOP Alchemy ORM tables. Instead of runtime factories or complex metadata cloning, we use a minimal static approach where each table category inherits from a simple mixin that reads its schema from an environment variable at module import time.

Changes

Core Implementation

  • schema_mixins.py - 8 new schema mixins (one per CDM category: Clinical, Health System, Health Economic, Structural, Unstructured, Metadata, Vocabulary, Derived) that read their schemas from OMOP_*_SCHEMA environment variables with sensible defaults matching CDM v5.4 conventions
  • decorators.py - Enhanced @cdm_table decorator to infer and apply schema from inherited mixins at class definition time
  • All 28 CDM table classes - Updated to inherit the appropriate category mixin as the first base class

Maintenance Layer

  • maintenance/tables.py and 8 other maintenance files - Added maintenance_table_schema() helper function to respect per-table resolved schemas while allowing CLI -db-schema override

Documentation & Configuration

  • docs/api/configuration.md - Complete guide with OHDSI CDM image reference, environment variable descriptions, table-to-category mappings, and operational implications
  • .example_dotenv - Example env var template for all 8 schema categories

Tests

  • tests/test_schema_mixins.py - Focused test suite validating schema resolution from environment, mixin inheritance, and CLI override behavior

How It Works

  1. At module import time: Each mixin class reads its env var and stores it as a class attribute (__omop_schema__)
  2. At table class definition: The @cdm_table decorator walks the MRO (method resolution order) to find __omop_schema__ from the inherited mixin and applies it to the table's SQLAlchemy schema
  3. At query time: ORM table objects carry the resolved schema, so queries transparently target the correct database schema
  4. At maintenance time: All CLI and programmatic maintenance operations respect the per-table schema unless overridden via --db-schema

Schema Variables & Defaults

  • OMOP_CLINICAL_SCHEMA=omop
  • OMOP_HEALTH_SYSTEM_SCHEMA=omop
  • OMOP_HEALTH_ECONOMIC_SCHEMA=omop
  • OMOP_STRUCTURAL_SCHEMA=omop
  • OMOP_UNSTRUCTURED_SCHEMA=omop
  • OMOP_METADATA_SCHEMA=omop
  • OMOP_VOCABULARY_SCHEMA=vocabulary
  • OMOP_DERIVED_SCHEMA=results

Benefits

Static: Schema assigned at class definition time, no runtime factories
Simple: 8 mixins, no inheritance hierarchies or metadata cloning
Flexible: Deployments can override any category's schema via env vars
Maintainable: Single helper function consolidates schema logic
Type-safe: All schema values are resolved before use, satisfying static type checkers


Note on OMOP CDM Alignment

During implementation, I observed that the project structure does not entirely follow the official OMOP CDM v5.4 categorization shown in the OMOP CDM image. Certain tables are placed in different categories than the official specification.

Known misalignments:

  • observation_period - Currently in the DERIVED category (results schema), but according to the CDM specification, it is a foundational Clinical record that determines valid observation periods for all subsequent analysis. Should arguably be in the CLINICAL category.

  • image and image_feature - These are custom extensions to the OMOP CDM model and are currently placed in the UNSTRUCTURED category. The official CDM v5.4 does not include these tables; they appear to be project-specific additions.

  • Visit-related tables - The VISIT_DETAIL and VISIT_OCCURRENCE tables are currently in HEALTH_SYSTEM, which aligns with the CDM specification, though some analytics frameworks treat visit infrastructure as shared between Clinical and Health System domains.

These discrepancies are noted for future CDM standardization efforts. A formal review and potential reclassification of these tables may be warranted to improve alignment with OHDSI standards and reduce confusion for users familiar with the official CDM specification.

@nicoloesch nicoloesch requested a review from gkennos April 14, 2026 06:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant