Recover transport-exhausted re-reviews with a bounded lower-effort fallback (#282)#283
Recover transport-exhausted re-reviews with a bounded lower-effort fallback (#282)#283yetval wants to merge 2 commits into
Conversation
…llback When Codex review retries exhaust the transient-transport budget, runCodex now runs one bounded lower-reasoning-effort fallback pass (high/medium -> low, low -> minimal) while enough of the timeout budget remains, instead of always returning the off-meta tidepool placeholder. The fallback review is returned with degraded mode disclosed in the summary, the original transport diagnostic preserved in evidence, and high confidence capped to medium so a degraded review cannot trigger an auto-close. If the fallback also fails, the thrown CodexReviewError preserves the original transport classification so the placeholder still reads as a transport failure. The minimum remaining-time budget is configurable via CLAWSWEEPER_CODEX_FALLBACK_MIN_BUDGET_MS (default 120000ms). Fixes openclaw#282.
|
Codex review: needs maintainer review before merge. Reviewed June 13, 2026, 10:38 AM ET / 14:38 UTC. Summary Reproducibility: yes. The linked issue and PR discussion provide repeated workflow logs for Review metrics: 3 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land this shape only if maintainers accept in-process degraded review routing with the confidence cap and disclosure; otherwise move the downgrade decision into an explicit recovery-lane or dispatch-policy gate. Do we have a high-confidence way to reproduce the issue? Yes. The linked issue and PR discussion provide repeated workflow logs for Is this the best way to solve the issue? Unclear. The code path is narrow and well covered, but whether the downgrade should happen in-process rather than only through the retry-failed-reviews lane is a maintainer automation-policy choice. AGENTS.md: found and applied where relevant. Codex review notes: model internal, reasoning high; reviewed against f398bbc24221. Label changesLabel justifications:
Evidence reviewedAcceptance criteria:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
Adding two pieces of supplementary evidence I gathered while investigating #282 from the affected PR's side, in case they're useful for the maintainer review of this fallback approach. 1. The failure was Codex-classified as
|
| Time (UTC) | Run | Result |
|---|---|---|
| 14:48:25 | Review event item openclaw/openclaw#92181 |
failed (this issue) |
| 14:50:47 | Review event item steipete/CodexBar#1442 |
success |
| 14:53:31 | Review event item steipete/CodexBar#1436 |
success |
| 14:57:34 | Review event item steipete/CodexBar#1433 |
success |
| 14:57:41 | Review event item openclaw/openclaw#91632 |
success (same target_repo) |
| 14:58:52 | Review event item openclaw/openclaw#92196 |
success (same target_repo) |
| 14:59:49 | Review event item steipete/CodexBar#1434 |
success |
So the rejection wasn't a transient global capacity dip — steipete/CodexBar#1442 succeeded ~2 minutes later, and openclaw/openclaw#91632 (same target repo) succeeded ~9 minutes later. Each #92181 retry sent the same large prompt and got rejected at admission, which means a longer in-process backoff alone would not have helped; only reducing per-request token weight does. That's a positive corroboration that this PR's lower-effort fallback (rather than just longer backoff) is the right minimal fix.
Timing detail
For completeness, the three exhausted attempts on run 27355240940 died this fast:
attempt 1: 14:48:25 → 14:48:37 12s
attempt 2: 14:48:52 → 14:49:23 31s
attempt 3: 14:49:53 → 14:50:04 11s
Each died well below the 600s --codex-timeout-ms budget and before model thinking — so the original timeout/large-reasoning hypothesis is ruled out, and the budget-share-with-fallback approach in this PR fits cleanly within the unused remainder.
(Posted #282 evidence at #282 (comment 4692572669) before noticing this PR was already up — flagging here so maintainers don't have to cross-reference. Happy to help test once merged on openclaw/openclaw#92181.)
|
Context from #282: the OP narrowed the root cause to a capacity (TPM admission) rejection, not a network flake or timeout. The three retries each died in 11-31s, far under the 600s budget, and other reviews on the same backend in the same window succeeded, so it is a per-request token-cost problem specific to that PR, not a fleet-wide dip. Their conclusion: backoff alone cannot help, since each retry resends the same oversized request; only reducing per-request token cost does. That validates the mechanism in this PR. The fallback here is exactly a per-request token-cost reduction ( One design choice worth confirming, since #282 also floats putting the downgrade in the
The in-process path is strictly faster for the contributor, and the two are not mutually exclusive (the cron lane can still re-attempt at lower effort if an in-process fallback also fails). If maintainers prefer to scope the downgrade to the recovery lane only, this is a small change to make. Flagging here to avoid a duplicate PR for the same fix. |
|
Adding another affected-case data point: I'm seeing the same failure mode on openclaw/openclaw#92251 and openclaw/openclaw#90610. The ClawSweeper re-review path hit transient_transport / Codex transport exhaustion, then the durable review/labels ended up as off-meta rather than reflecting the PR proof/status. So this does not seem isolated to #92181. #283's bounded lower-effort fallback looks like the right shape to unblock these re-reviews without adding unbounded retry behavior. |
Resolve conflicts in runCodex and its tests: - Keep the bounded lower-effort fallback (runReviewPass closure + degraded annotation) on top of main's runCodexProcess invocation, --json output, and richer error classification (terminal vs retryable, errorCode/signal). - CodexReviewError now carries retryable alongside errorCode/signal. - Update the fallback-failure test stub to surface the rate-limit reason via the JSONL turn.failed channel that main's classifier now reads.
Problem
@clawsweeper re-reviewon PR openclaw/openclaw#92181 consistently exhausts the Codex transient-transport retry budget (attempt 2/3,3/3,reason=transient_transport) and then falls back to a low-confidence placeholder:decision=keep_open confidence=low action=kept_openoff-meta tidepool / [P1] Review did not complete (retryable codex transport failure)The high-reasoning-effort path produces long thinking streams; when the backend/proxy closes the stream early, every same-effort retry hits the same failure and the contributor has no path forward. See #282.
Fix
runCodexnow, after the transient-transport retries are exhausted, runs one bounded lower-reasoning-effort fallback pass when enough of the timeout budget remains:high/medium->low,low->minimal(minimal/unknown -> no fallback).--codex-timeout-msbudget, so a re-review never becomes a stalled job.degraded review mode,original codex transport failure),highconfidence capped tomedium, so a degraded review can never trigger an auto-close (close actions requireconfidence === "high").CodexReviewErrorpreserves the original transport classification, so the placeholder review still reads as a transport failure (capacity/network) rather than whatever the fallback surfaced last.The minimum remaining-time budget for attempting the fallback is configurable via
CLAWSWEEPER_CODEX_FALLBACK_MIN_BUDGET_MS(default120000).Real behavior proof
Driving the real
runCodexcode path end-to-end through the builtdist/with a stubbedcodexthat reproduces #282 (stream closes athigheffort, succeeds at lower effort),CLAWSWEEPER_CODEX_REVIEW_ATTEMPTS=3:BEFORE (origin/main @ 2e4282b, unpatched) — reproduces #282
AFTER (patched) — bounded lower-effort fallback
The unpatched build dead-ends at the placeholder; the patched build runs exactly one extra
low-effort pass, returns a real verdict, discloses degraded mode, and caps the fixture'shighconfidence tomedium.Tests
Added to
test/clawsweeper.test.ts(all pass undernode --test ... --test-name-pattern 'Codex|transient|retry|fallback|failure'):lowerCodexReasoningEffort steps down one tier and stops at minimalrunCodex completes via a lower-effort fallback after transport exhaustionrunCodex keeps the transport classification when the fallback also failsrunCodex skips the lower-effort fallback when the time budget is too smallVerified locally:
pnpm run build,pnpm run lint:src,pnpm run lint:scripts,pnpm run check:active-surface,pnpm run check:limits,pnpm run format:check, and the targeted test pattern (399 pass / 0 fail).Related issues (triage cross-reference)
transient_transporthas no earlier hits in the tracker, so this failure mode is new. Distinct-but-adjacent issues:--codex-timeout-mson large/video PRs. Here every attempt returnstransient_transportafter roughly 2:46 of wall-clock and never reaches the timeout; the PR is moderate (~21 KB diff, no video proof), so Adaptive codex timeout based on PR size and media proofs — large PRs with videos consistently time out #272's size-based mitigations would not trigger. This fix is self-contained inrunCodexand does not depend on Adaptive codex timeout based on PR size and media proofs — large PRs with videos consistently time out #272's dispatch-payload work; if Adaptive codex timeout based on PR size and media proofs — large PRs with videos consistently time out #272 surfacescodex_*knobs indispatchItemReview(), an opt-in flag for this fallback could live there later, but it is not required.re-reviewworkflow surface, different leg (verdict-sync path vs. this review-execution path).masterdefault branch): sibling re-review failure category, unrelated to Codex transport (repo fetch).