Problem
L2 prompts were domain-agnostic, which meant healthcare-specific semantic types (e.g., clinical_finding, medication, lab_observation) and vocabulary families (SNOMED, LOINC, RxNorm, ICD) were never biased for — the LLM had to rediscover the domain for every table. This produced inconsistent semantic typing on cBioPortal/OMOP ingests and missed obvious vocabulary hints.
Proposed solution
- Add
DomainContext and DomainCandidate pydantic models carrying the active domain through the pipeline.
- Add a
--domain CLI flag with profiler-based auto-detection (fallback to generic when unknown).
- Inject domain bias headers into Stage A/B/C prompts: healthcare path gets a semantic-type inventory and vocabulary family hints; generic path stays neutral.
- Thread
DomainContext into VocabColumnContext so L3 can consume the same signal.
Tracked in OpenSpec change source-semantic-hardening, tasks.md §1 and §7.
Alternatives considered
- Auto-detecting domain per-table from column names only — rejected; too fragile, better to declare once per workspace.
- Hardcoding healthcare bias always — rejected; sema is meant to generalize beyond healthcare.
Closed by #63.
Problem
L2 prompts were domain-agnostic, which meant healthcare-specific semantic types (e.g.,
clinical_finding,medication,lab_observation) and vocabulary families (SNOMED, LOINC, RxNorm, ICD) were never biased for — the LLM had to rediscover the domain for every table. This produced inconsistent semantic typing on cBioPortal/OMOP ingests and missed obvious vocabulary hints.Proposed solution
DomainContextandDomainCandidatepydantic models carrying the active domain through the pipeline.--domainCLI flag with profiler-based auto-detection (fallback togenericwhen unknown).DomainContextintoVocabColumnContextso L3 can consume the same signal.Tracked in OpenSpec change
source-semantic-hardening, tasks.md §1 and §7.Alternatives considered
Closed by #63.