Benchmark: governed loop stopped at $2.30, ungoverned burned $5.20 #45
Keesan12
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
One benchmark changed how I think about unattended coding loops.
Same flaky-CI-gate task:
$5.20and still failed$2.30with a clean stop and a replayable run recordThe interesting part is not just lower cost. It is having an actual stop contract around the loop: remaining budget, verifier result, failure class, explicit halt reason, and a JSONL record you can inspect later.
If you are already running Claude Code, Codex, OpenCode, or your own harness unattended, two concrete questions:
If you want to pressure-test it, the repo is here: https://github.com/Keesan12/martin-loop and npm is here: https://www.npmjs.com/package/martin-loop
If it is useful, star the repo. If you try it, I want the failure cases and rough edges, not polite feedback.
Beta Was this translation helpful? Give feedback.
All reactions