feat: staged A→B→C L2 pipeline with bounded recovery

**Problem**

The single-pass L2 prompt was doing too much — entity identification, grain, property classification, and value decoding all at once — which caused low-confidence entity assignments, silently dropped properties on batch failures, and made recovery impossible. There was no clean way to reason about which stage of interpretation failed or to retry just the failing part.

**Proposed solution**

Decompose L2 into three staged LLM calls with explicit ownership:

- **Stage A** — entity identification and grain hypothesis per table.
- **Stage B** — property classification (semantic type, concept role, cardinality), with bounded recovery: single retry, batch-split on schema failure, and Tier-1 heuristic rescue for columns the LLM drops or misclassifies.
- **Stage C** — conditional value decoding, only invoked for columns whose Stage B output indicates a finite value set.
- **Merge step** — combines stage outputs into the final per-table interpretation with a documented ownership matrix (which stage wins on which field).

Tracked in OpenSpec change `source-semantic-hardening`, tasks.md §2–4.

**Alternatives considered**

- Keeping single-pass L2 and tuning the prompt — rejected; prompt was already at context-budget ceiling.
- Two-pass (entity + properties merged) — previous iteration; still couldn't recover from batch-level schema failures cleanly.

Closed by #63.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: staged A→B→C L2 pipeline with bounded recovery #64

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat: staged A→B→C L2 pipeline with bounded recovery #64

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions