feat(build): stage-b failure containment#96
Merged
Conversation
Restricts the service circuit breaker to service-health failures only, gates the resume wipe correctly, persists forensic artifacts on every failed table, classifies metadata-availability tiers with a B_PARTIAL admission floor for sparse/name_only tables, and adds a run-level quality budget that aborts on graph-health or run-reliability storms. - circuit-breaker: opt-in service-health classification (5xx, transport, timeout); content failures, rate limits, and unknowns no longer trip. - resume: schema-wipe loop gated on `not config.resume`. Repaired the latent _try_resume materialize call to pass source_schema so resume actually engages without Neo4j MERGE rejecting null source_schema. - diagnostics: SemanticEngine carries an opt-in LLMAttempt buffer populated around every invoke (stage_a, per-batch stage_b including failed batches, stage_c). LLMStageError gains llm_attempts; new StageBFailureError(LLMStageError) carries stage_a/stage_b/metrics so the failure-artifact writer reads off the exception, not the engine. - failure-diagnostics: dump_table_failure_artifact writes *__failure.json with prompts, prompt hashes, raw responses, step errors, unresolved columns, retry/split/rescue counters, failure_classification (service_health|rate_limit|content_failure| semantic_coverage|circuit_open|unknown), and metadata_tier. Best- effort: artifact write errors log WARN and never block the build. - metadata-availability-policy: pure source-agnostic classifier (rich|sparse|name_only) on L1 evidence shape only. determine_b_status becomes tier-keyed; rich keeps the 0.75 floor, sparse/name_only admit B_PARTIAL at 0.60 (configurable). B_PARTIAL counts as succeeded for the graph-health budget; per-tier counts on the report. - run-quality-budget: stateless check post-_collect_results, two triggers (stage_b_failure_rate over graph-contributing denominator; run_non_contributing_rate over run-marginal denominator). Single QualityBudgetExceeded with trigger discriminator; stable ordering. CLI gains --no-quality-budget; exit code 7 distinguishes budget abort from other failures. Smoke-tested against cbioportal_gbm_tcga_pan_can_atlas_2018 with --resume: 10 tables skipped via resume, mutation committed B_PARTIAL at raw=0.67 (sparse tier), resource_definition failed below partial floor, complete failure artifact written with stage_a output, six captured llm_attempts, prompt hashes, and unresolved-column tiers. Tests: 1351 unit pass; mypy clean; coverage 89%. Signed-off-by: deanban <3989225+deanban@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stage-B failure containment work — six sections that change how the build pipeline classifies, recovers from, and persists evidence about failures during semantic interpretation:
§1). The breaker only trips on transport / 5xx failures; rate limits, JSON parse errors, Pydantic validation errors, and universal-parser ValueErrors no longer trigger cascade-skips of healthy tables.§2).--resumenow preserves prior assertions instead of running the schema-wipe loop. Verified on the populated graph: 37/37 tables markedskipped, assertions 6962→6962.§3). Opt-in (gated oneval_dump_dir), reset per table, populated for both successful and failed batch invocations.StageBFailureErrorcarries staged context (§4). New typed exception subclass ofLLMStageErrorcarryingstage_a,stage_b,metrics, andllm_attemptsso the failure-artifact writer reads everything off the exception — no engine reference required from the worker.§5).<table>__<label>__failure.jsonwritten on every failure path (Stage A failure, B_FAILED, circuit-open). Captures everyLLMClient.invokecall across stages including failed batches; classification taxonomy aligned with the breaker.B_PARTIALoutcome (§5b). Newmetadata_tierclassifier (rich/sparse/name_only) computed before Stage A so the tier is on every failure artifact. Tier-keyedB_PARTIALadmission floor — rich tables stay at 0.75; sparse / name_only get a lowered floor (default 0.60) so partial commits are admitted instead of failing a coverage-floor check the source can't meet.§6). Two triggers:stage_b_failure_rate(default 30%) andrun_non_contributing_rate(default 40%). Resume-skipped tables count toward the graph-contributing denominator but not the non-contributing trigger.--no-quality-budgetdisables both ceilings;QualityBudgetExceededexits with code 7.Verification
uv run pytest -q— 1351 passed / 1 skipped / 38 deselecteduv run mypy src/sema/— clean across 112 filesquality_budget.py100%) — gate ≥85%.runs/build_resume_smoke_20260505_163145.log): 37/37 skipped, no circuit-breaker skips on healthy LLM, no quality-budget abort, assertions preservedstep_errors[].exception_typediscriminatesJSONDecodeErrorvsValidationErrorwithin thecontent_failureclassification;llm_attemptscarries failed-batch prompt text; bothStageBFailureError(→ semantic_coverage) and plainLLMStageError(→ content_failure) paths verifiedRelated issues
cbioportal_gbm_tcga_pan_can_atlas_2018+cbioportal_msk_chord_2024; bias check + sign-off still owed under that issue.54miniendpoint are reachable fromsema'sprovider=customcode path; Measurement A re-run still owed under that issue.Follow-ups surfaced by this PR
.wolf/anatomy.mdto cover src/ and tests/ trees (scanner scope misconfiguration)sparse)Test plan
uv run pytest -qgreen locallyuv run mypy src/sema/clean locally