What belongs in a real failure taxonomy for coding-agent runs? #49

Keesan12 · 2026-05-20T15:14:57Z

Keesan12
May 20, 2026
Maintainer

Most agent runs do not fail in one generic way. The useful split we keep seeing is more like:

budget exhaustion versus budget stop with a useful receipt
verifier stalled versus verifier improved but blocked on approval
tool success versus task success
unsafe side effect blocked versus harmless retryable error
context drift versus adapter or secret boundary mismatch

If you are building or operating coding agents, what buckets actually matter in practice?

I am trying to make MartinLoop more useful as an open-source control layer here, so I care more about the taxonomy being operationally honest than theoretically neat. If you have a better split, edge case, or field you would want captured in the stop receipt, I would really like to see it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What belongs in a real failure taxonomy for coding-agent runs? #49

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

What belongs in a real failure taxonomy for coding-agent runs? #49

Uh oh!

Keesan12 May 20, 2026 Maintainer

Replies: 0 comments

Keesan12
May 20, 2026
Maintainer