Skip to content

feat: domain-aware L2 prompts (healthcare + generic) #65

@deanban

Description

@deanban

Problem

L2 prompts were domain-agnostic, which meant healthcare-specific semantic types (e.g., clinical_finding, medication, lab_observation) and vocabulary families (SNOMED, LOINC, RxNorm, ICD) were never biased for — the LLM had to rediscover the domain for every table. This produced inconsistent semantic typing on cBioPortal/OMOP ingests and missed obvious vocabulary hints.

Proposed solution

  • Add DomainContext and DomainCandidate pydantic models carrying the active domain through the pipeline.
  • Add a --domain CLI flag with profiler-based auto-detection (fallback to generic when unknown).
  • Inject domain bias headers into Stage A/B/C prompts: healthcare path gets a semantic-type inventory and vocabulary family hints; generic path stays neutral.
  • Thread DomainContext into VocabColumnContext so L3 can consume the same signal.

Tracked in OpenSpec change source-semantic-hardening, tasks.md §1 and §7.

Alternatives considered

  • Auto-detecting domain per-table from column names only — rejected; too fragile, better to declare once per workspace.
  • Hardcoding healthcare bias always — rejected; sema is meant to generalize beyond healthcare.

Closed by #63.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions