Problem
The single-pass L2 prompt was doing too much — entity identification, grain, property classification, and value decoding all at once — which caused low-confidence entity assignments, silently dropped properties on batch failures, and made recovery impossible. There was no clean way to reason about which stage of interpretation failed or to retry just the failing part.
Proposed solution
Decompose L2 into three staged LLM calls with explicit ownership:
- Stage A — entity identification and grain hypothesis per table.
- Stage B — property classification (semantic type, concept role, cardinality), with bounded recovery: single retry, batch-split on schema failure, and Tier-1 heuristic rescue for columns the LLM drops or misclassifies.
- Stage C — conditional value decoding, only invoked for columns whose Stage B output indicates a finite value set.
- Merge step — combines stage outputs into the final per-table interpretation with a documented ownership matrix (which stage wins on which field).
Tracked in OpenSpec change source-semantic-hardening, tasks.md §2–4.
Alternatives considered
- Keeping single-pass L2 and tuning the prompt — rejected; prompt was already at context-budget ceiling.
- Two-pass (entity + properties merged) — previous iteration; still couldn't recover from batch-level schema failures cleanly.
Closed by #63.
Problem
The single-pass L2 prompt was doing too much — entity identification, grain, property classification, and value decoding all at once — which caused low-confidence entity assignments, silently dropped properties on batch failures, and made recovery impossible. There was no clean way to reason about which stage of interpretation failed or to retry just the failing part.
Proposed solution
Decompose L2 into three staged LLM calls with explicit ownership:
Tracked in OpenSpec change
source-semantic-hardening, tasks.md §2–4.Alternatives considered
Closed by #63.