Skip to content

fix: Stage B few-shot examples teaching LLM to drop aliases #71

@deanban

Description

@deanban

Describe the bug

Running the staged L2 pipeline against the 12-table dev slice produced 52 regressions where previously-generated has_alias assertions disappeared. Root cause: the 12 Stage B few-shot examples all had empty synonyms: [] fields, which the LLM treated as a positive signal that synonyms should be omitted.

To reproduce

  1. Run L2 build against the dev slice before commit 783266d.
  2. Diff assertions against a prior run.
  3. Observe 52 missing has_alias assertions concentrated on columns with obvious synonyms (e.g. PATIENT_IDpatient).

Expected behavior

Few-shot examples should demonstrate realistic synonym output, not empty arrays. The LLM should emit has_alias whenever a plausible synonym exists.

Environment

  • Affects staged L2 builds prior to commit 783266d.
  • src/sema/engine/ (Stage B few-shot example library).

Fix: populated all 12 Stage B examples with realistic synonyms and switched the prompt to compact JSON to save tokens while making the synonym field visually prominent. Fixed in commit 783266d. Closed by #63.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions