feat: default and customisable schema definitions for OMOP CDM#3
Open
nicoloesch wants to merge 1 commit into
Open
feat: default and customisable schema definitions for OMOP CDM#3nicoloesch wants to merge 1 commit into
nicoloesch wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Schema Configuration for OMOP CDM Tables
Summary
This PR implements environment-variable-backed schema configuration for OMOP Alchemy ORM tables. Instead of runtime factories or complex metadata cloning, we use a minimal static approach where each table category inherits from a simple mixin that reads its schema from an environment variable at module import time.
Changes
Core Implementation
schema_mixins.py- 8 new schema mixins (one per CDM category: Clinical, Health System, Health Economic, Structural, Unstructured, Metadata, Vocabulary, Derived) that read their schemas fromOMOP_*_SCHEMAenvironment variables with sensible defaults matching CDM v5.4 conventionsdecorators.py- Enhanced@cdm_tabledecorator to infer and apply schema from inherited mixins at class definition timeMaintenance Layer
maintenance/tables.pyand 8 other maintenance files - Addedmaintenance_table_schema()helper function to respect per-table resolved schemas while allowing CLI-db-schemaoverrideDocumentation & Configuration
docs/api/configuration.md- Complete guide with OHDSI CDM image reference, environment variable descriptions, table-to-category mappings, and operational implications.example_dotenv- Example env var template for all 8 schema categoriesTests
tests/test_schema_mixins.py- Focused test suite validating schema resolution from environment, mixin inheritance, and CLI override behaviorHow It Works
__omop_schema__)@cdm_tabledecorator walks the MRO (method resolution order) to find__omop_schema__from the inherited mixin and applies it to the table's SQLAlchemy schema--db-schemaSchema Variables & Defaults
OMOP_CLINICAL_SCHEMA=omopOMOP_HEALTH_SYSTEM_SCHEMA=omopOMOP_HEALTH_ECONOMIC_SCHEMA=omopOMOP_STRUCTURAL_SCHEMA=omopOMOP_UNSTRUCTURED_SCHEMA=omopOMOP_METADATA_SCHEMA=omopOMOP_VOCABULARY_SCHEMA=vocabularyOMOP_DERIVED_SCHEMA=resultsBenefits
Static: Schema assigned at class definition time, no runtime factories
Simple: 8 mixins, no inheritance hierarchies or metadata cloning
Flexible: Deployments can override any category's schema via env vars
Maintainable: Single helper function consolidates schema logic
Type-safe: All schema values are resolved before use, satisfying static type checkers
Note on OMOP CDM Alignment
During implementation, I observed that the project structure does not entirely follow the official OMOP CDM v5.4 categorization shown in the OMOP CDM image. Certain tables are placed in different categories than the official specification.
Known misalignments:
observation_period- Currently in theDERIVEDcategory (results schema), but according to the CDM specification, it is a foundational Clinical record that determines valid observation periods for all subsequent analysis. Should arguably be in theCLINICALcategory.imageandimage_feature- These are custom extensions to the OMOP CDM model and are currently placed in theUNSTRUCTUREDcategory. The official CDM v5.4 does not include these tables; they appear to be project-specific additions.Visit-related tables - The
VISIT_DETAILandVISIT_OCCURRENCEtables are currently inHEALTH_SYSTEM, which aligns with the CDM specification, though some analytics frameworks treat visit infrastructure as shared between Clinical and Health System domains.These discrepancies are noted for future CDM standardization efforts. A formal review and potential reclassification of these tables may be warranted to improve alignment with OHDSI standards and reduce confusion for users familiar with the official CDM specification.