Skip to content

feat: staged A→B→C L2 pipeline with bounded recovery #64

@deanban

Description

@deanban

Problem

The single-pass L2 prompt was doing too much — entity identification, grain, property classification, and value decoding all at once — which caused low-confidence entity assignments, silently dropped properties on batch failures, and made recovery impossible. There was no clean way to reason about which stage of interpretation failed or to retry just the failing part.

Proposed solution

Decompose L2 into three staged LLM calls with explicit ownership:

  • Stage A — entity identification and grain hypothesis per table.
  • Stage B — property classification (semantic type, concept role, cardinality), with bounded recovery: single retry, batch-split on schema failure, and Tier-1 heuristic rescue for columns the LLM drops or misclassifies.
  • Stage C — conditional value decoding, only invoked for columns whose Stage B output indicates a finite value set.
  • Merge step — combines stage outputs into the final per-table interpretation with a documented ownership matrix (which stage wins on which field).

Tracked in OpenSpec change source-semantic-hardening, tasks.md §2–4.

Alternatives considered

  • Keeping single-pass L2 and tuning the prompt — rejected; prompt was already at context-budget ceiling.
  • Two-pass (entity + properties merged) — previous iteration; still couldn't recover from batch-level schema failures cleanly.

Closed by #63.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions