Skip to content

feat: cBioPortal + OMOP ingest pipeline and Databricks bridge #69

@deanban

Description

@deanban

Problem

Evaluating source-semantic-hardening required a realistic healthcare corpus. Existing connectors only read generic Databricks metadata — we had no way to land cBioPortal study data (SV, CNA, gene panel matrices, resources) into Databricks or to model OMOP CDM targets for semantic mapping.

Proposed solution

  • Add cBioPortal parsers for structural variants, copy number alterations, gene-panel-matrix, and resource files.
  • Add OMOP CDM model coverage so target-model mapping has a first-class schema to map to.
  • Build a Databricks bridge with DuckDB staging for iterating on parsed study data locally before landing it.

Delivered in commits 137066f and 18f13f4 on branch dean/feat/source-semantic-hardening.

Alternatives considered

  • Mock cBioPortal fixtures — rejected; wouldn't exercise the Databricks connector path or real grain edge cases.
  • Skip OMOP and only evaluate on raw cBioPortal — rejected; target-model mapping is the forcing function for Stage A grain and Stage B semantic typing.

Closed by #63.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions