Updated Analysis
The retry cap assumption was wrong — there's no cap in the code. The 400 on attempt 3 was the evidence contract check (line 285-287 in api.py) rejecting test_run_id because the plan declared required_evidence: ["test_output_hash"].
The real problem: the agent can't discover what evidence keys a verifier expects. The plan declares required_evidence and the verifier checks its own keys — but if they don't match, the agent gets stuck in a loop of:
- Plan says submit
test_output_hash → evidence contract passes → verifier fails (wants test_run_id)
- Agent tries
test_run_id → evidence contract rejects (not in plan's required_evidence)
Proposed Fixes
-
Verifier introspection: stepproof_runbook_get or a new tool should expose what evidence keys each verifier expects, so agents can declare the right required_evidence at plan time.
-
Evidence contract should be union, not intersection: Accept keys that are in required_evidence OR that the verifier recognizes. Extra keys shouldn't cause a 400.
-
Better error message: When a verifier fails with "missing key X", the error should say "Hint: your plan's required_evidence should include 'test_run_id' for the verify_tests_green verifier."
Updated Analysis
The retry cap assumption was wrong — there's no cap in the code. The 400 on attempt 3 was the evidence contract check (line 285-287 in api.py) rejecting
test_run_idbecause the plan declaredrequired_evidence: ["test_output_hash"].The real problem: the agent can't discover what evidence keys a verifier expects. The plan declares
required_evidenceand the verifier checks its own keys — but if they don't match, the agent gets stuck in a loop of:test_output_hash→ evidence contract passes → verifier fails (wantstest_run_id)test_run_id→ evidence contract rejects (not in plan'srequired_evidence)Proposed Fixes
Verifier introspection:
stepproof_runbook_getor a new tool should expose what evidence keys each verifier expects, so agents can declare the rightrequired_evidenceat plan time.Evidence contract should be union, not intersection: Accept keys that are in
required_evidenceOR that the verifier recognizes. Extra keys shouldn't cause a 400.Better error message: When a verifier fails with "missing key X", the error should say "Hint: your plan's required_evidence should include 'test_run_id' for the verify_tests_green verifier."