Problem
Evaluating source-semantic-hardening required a realistic healthcare corpus. Existing connectors only read generic Databricks metadata — we had no way to land cBioPortal study data (SV, CNA, gene panel matrices, resources) into Databricks or to model OMOP CDM targets for semantic mapping.
Proposed solution
- Add cBioPortal parsers for structural variants, copy number alterations, gene-panel-matrix, and resource files.
- Add OMOP CDM model coverage so target-model mapping has a first-class schema to map to.
- Build a Databricks bridge with DuckDB staging for iterating on parsed study data locally before landing it.
Delivered in commits 137066f and 18f13f4 on branch dean/feat/source-semantic-hardening.
Alternatives considered
- Mock cBioPortal fixtures — rejected; wouldn't exercise the Databricks connector path or real grain edge cases.
- Skip OMOP and only evaluate on raw cBioPortal — rejected; target-model mapping is the forcing function for Stage A grain and Stage B semantic typing.
Closed by #63.
Problem
Evaluating
source-semantic-hardeningrequired a realistic healthcare corpus. Existing connectors only read generic Databricks metadata — we had no way to land cBioPortal study data (SV, CNA, gene panel matrices, resources) into Databricks or to model OMOP CDM targets for semantic mapping.Proposed solution
Delivered in commits
137066fand18f13f4on branchdean/feat/source-semantic-hardening.Alternatives considered
Closed by #63.