Skip to content

Recover transport-exhausted re-reviews with a bounded lower-effort fallback (#282)#283

Open
yetval wants to merge 2 commits into
openclaw:mainfrom
yetval:fix/codex-transport-fallback-282
Open

Recover transport-exhausted re-reviews with a bounded lower-effort fallback (#282)#283
yetval wants to merge 2 commits into
openclaw:mainfrom
yetval:fix/codex-transport-fallback-282

Conversation

@yetval

@yetval yetval commented Jun 12, 2026

Copy link
Copy Markdown

Problem

@clawsweeper re-review on PR openclaw/openclaw#92181 consistently exhausts the Codex transient-transport retry budget (attempt 2/3, 3/3, reason=transient_transport) and then falls back to a low-confidence placeholder:

  • decision=keep_open confidence=low action=kept_open
  • routed verdict: off-meta tidepool / [P1] Review did not complete (retryable codex transport failure)

The high-reasoning-effort path produces long thinking streams; when the backend/proxy closes the stream early, every same-effort retry hits the same failure and the contributor has no path forward. See #282.

Fix

runCodex now, after the transient-transport retries are exhausted, runs one bounded lower-reasoning-effort fallback pass when enough of the timeout budget remains:

  • Effort step-down: high/medium -> low, low -> minimal (minimal/unknown -> no fallback).
  • The fallback runs a single attempt and shares the original --codex-timeout-ms budget, so a re-review never becomes a stalled job.
  • A successful fallback returns a real review, with:
    • degraded mode disclosed in the summary,
    • the original transport diagnostic preserved as evidence (degraded review mode, original codex transport failure),
    • high confidence capped to medium, so a degraded review can never trigger an auto-close (close actions require confidence === "high").
  • If the fallback also fails, the thrown CodexReviewError preserves the original transport classification, so the placeholder review still reads as a transport failure (capacity/network) rather than whatever the fallback surfaced last.

The minimum remaining-time budget for attempting the fallback is configurable via CLAWSWEEPER_CODEX_FALLBACK_MIN_BUDGET_MS (default 120000).

Real behavior proof

Driving the real runCodex code path end-to-end through the built dist/ with a stubbed codex that reproduces #282 (stream closes at high effort, succeeds at lower effort), CLAWSWEEPER_CODEX_REVIEW_ATTEMPTS=3:

BEFORE (origin/main @ 2e4282b, unpatched) — reproduces #282

[review] codex-retry #92181 attempt=2/3 delay_ms=5 reason=transient_transport
[review] codex-retry #92181 attempt=3/3 delay_ms=10 reason=transient_transport
reasoning-efforts-seen: high, high, high
codex-invocations: 3
{
  "threw": true,
  "errorHead": "Codex review failed for #92181 with exit 1.",
  "placeholderDecision": "keep_open",
  "placeholderConfidence": "low",
  "placeholderSummary": "Codex review failed: retryable codex transport failure (network) (exit 1).",
  "placeholderChangeSummary": "Review failed before ClawSweeper could summarize the requested change."
}

AFTER (patched) — bounded lower-effort fallback

[review] codex-retry #92181 attempt=2/3 delay_ms=5 reason=transient_transport
[review] codex-retry #92181 attempt=3/3 delay_ms=10 reason=transient_transport
[review] codex-fallback #92181 reason=transient_transport from_effort=high to_effort=low remaining_ms=599702
reasoning-efforts-seen: high, high, high, low
codex-invocations: 4
{
  "threw": false,
  "decision": "close",
  "confidence": "medium",
  "summaryHead": "Degraded review: Codex hit a retryable codex transport failure (network) at high reasoning effort, so ClawSweeper completed this review with...",
  "changeSummary": "Adds the retry-classification helper plus regression coverage.",
  "evidenceLabels": ["degraded review mode", "original codex transport failure", "implementation"]
}

The unpatched build dead-ends at the placeholder; the patched build runs exactly one extra low-effort pass, returns a real verdict, discloses degraded mode, and caps the fixture's high confidence to medium.

Tests

Added to test/clawsweeper.test.ts (all pass under node --test ... --test-name-pattern 'Codex|transient|retry|fallback|failure'):

  • lowerCodexReasoningEffort steps down one tier and stops at minimal
  • runCodex completes via a lower-effort fallback after transport exhaustion
  • runCodex keeps the transport classification when the fallback also fails
  • runCodex skips the lower-effort fallback when the time budget is too small

Verified locally: pnpm run build, pnpm run lint:src, pnpm run lint:scripts, pnpm run check:active-surface, pnpm run check:limits, pnpm run format:check, and the targeted test pattern (399 pass / 0 fail).

Related issues (triage cross-reference)

transient_transport has no earlier hits in the tracker, so this failure mode is new. Distinct-but-adjacent issues:

…llback

When Codex review retries exhaust the transient-transport budget, runCodex
now runs one bounded lower-reasoning-effort fallback pass (high/medium -> low,
low -> minimal) while enough of the timeout budget remains, instead of always
returning the off-meta tidepool placeholder.

The fallback review is returned with degraded mode disclosed in the summary,
the original transport diagnostic preserved in evidence, and high confidence
capped to medium so a degraded review cannot trigger an auto-close. If the
fallback also fails, the thrown CodexReviewError preserves the original
transport classification so the placeholder still reads as a transport failure.

The minimum remaining-time budget is configurable via
CLAWSWEEPER_CODEX_FALLBACK_MIN_BUDGET_MS (default 120000ms).

Fixes openclaw#282.
@clawsweeper

clawsweeper Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed June 13, 2026, 10:38 AM ET / 14:38 UTC.

Summary
Adds a configurable runCodex lower-reasoning-effort fallback after retryable Codex transport exhaustion, degraded-review annotation/confidence capping, and regression tests.

Reproducibility: yes. The linked issue and PR discussion provide repeated workflow logs for transient_transport exhaustion, and current main source shows same-effort retry behavior without a fallback; I did not replay the live Codex capacity failure locally.

Review metrics: 3 noteworthy metrics.

  • Files changed: 2 modified: 1 runtime file, 1 test file. The change is focused on the Codex review execution path and its regression coverage.
  • Fallback attempts: 1 additional lower-effort attempt. The retry expansion is bounded, which matters for automation availability and cost before merge.
  • Regression tests: 4 added. The tests cover the new step-down helper and the main success/failure/budget branches.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • none.

Risk before merge

  • [P2] A successful fallback lets a lower-reasoning-effort review influence review labels and maintainer routing after high-effort transport exhaustion; the visible degraded disclosure and medium-confidence cap reduce but do not remove that automation-policy risk.
  • [P2] The fallback minimum budget is an environment-only operator setting, so maintainers need to decide whether CLAWSWEEPER_CODEX_FALLBACK_MIN_BUDGET_MS is enough control or whether dispatch/workflow-level policy should gate it.

Maintainer options:

  1. Land With Current Safeguards
    Maintainers can accept the one-attempt lower-effort fallback because the PR discloses degraded mode and caps high confidence to medium.
  2. Gate The Fallback Policy
    Maintainers can require a workflow or dispatch-level opt-in before lower-effort fallback results affect labels and routing.
  3. Defer To Recovery Lane
    Maintainers can pause or close this PR if they prefer the existing retry-failed-reviews recovery lane to own lower-effort retries later.

Next step before merge

  • [P2] Maintainers should decide whether lower-effort in-process reviews may route labels after transport exhaustion; there is no narrow automated repair needed from this review.

Security
Cleared: No concrete security or supply-chain concern was found; the diff changes local Codex subprocess retry behavior and tests without adding dependencies, permissions, secrets, or external downloads.

Review details

Best possible solution:

Land this shape only if maintainers accept in-process degraded review routing with the confidence cap and disclosure; otherwise move the downgrade decision into an explicit recovery-lane or dispatch-policy gate.

Do we have a high-confidence way to reproduce the issue?

Yes. The linked issue and PR discussion provide repeated workflow logs for transient_transport exhaustion, and current main source shows same-effort retry behavior without a fallback; I did not replay the live Codex capacity failure locally.

Is this the best way to solve the issue?

Unclear. The code path is narrow and well covered, but whether the downgrade should happen in-process rather than only through the retry-failed-reviews lane is a maintainer automation-policy choice.

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against f398bbc24221.

Label changes

Label justifications:

  • P1: The PR targets a broken re-review workflow that leaves real contributor PRs with failed/off-meta ClawSweeper routing after repeated Codex transport exhaustion.
  • merge-risk: 🚨 automation: Merging changes review execution and label-routing behavior by allowing a degraded lower-effort Codex pass to produce the durable review verdict.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): Sufficient: the PR body includes copied terminal output from the built runtime path showing before/after behavior for transport exhaustion and the bounded lower-effort fallback.
  • proof: sufficient: Contributor real behavior proof is sufficient. Sufficient: the PR body includes copied terminal output from the built runtime path showing before/after behavior for transport exhaustion and the bounded lower-effort fallback.
Evidence reviewed

Acceptance criteria:

  • [P1] pnpm run check.
  • [P2] node --test test/clawsweeper.test.ts --test-name-pattern 'Codex|transient|retry|fallback|failure'.

What I checked:

  • repository policy applied: AGENTS.md was read fully and points review of this PR at src/clawsweeper.ts, tests, and conservative automation-safe handling. (AGENTS.md:3, f398bbc24221)
  • current main lacks fallback symbols: Current main has no lowerCodexReasoningEffort, codex-fallback, or CLAWSWEEPER_CODEX_FALLBACK_MIN_BUDGET_MS hits, so the central change is not already implemented. (src/clawsweeper.ts, f398bbc24221)
  • fallback implementation: The PR head adds the fallback budget helper, effort step-down helper, degraded annotation, and one-attempt fallback call after retryable CodexReviewError exhaustion. (src/clawsweeper.ts:6384, ae0f78a86883)
  • bounded fallback call site: The PR head only attempts fallback when the original error is retryable, a lower effort exists, and remaining timeout budget meets the configured minimum. (src/clawsweeper.ts:6856, ae0f78a86883)
  • regression coverage: The PR head adds tests for effort step-down, successful lower-effort fallback, preserving transport classification when fallback also fails, and skipping fallback when budget is too small. (test/clawsweeper.test.ts:14977, ae0f78a86883)
  • real behavior proof in PR body: The PR body includes before/after terminal output from built dist/ driving the real runCodex path with a stubbed codex process: unpatched exhausts same-effort retries and patched performs exactly one low-effort fallback with medium confidence. (ae0f78a86883)

Likely related people:

  • Peter Steinberger: History and blame show major current-main ownership of runCodex, transient retry handling, file-backed Codex process output, and hardening around this exact code path. (role: recent area contributor; confidence: high; commits: 9209fa90, 76b696750, 6c695eadf; files: src/clawsweeper.ts, src/codex-process.ts, src/codex-transient.ts)
  • Momo: Blame attributes the capacity-classification branch in codexFailureReason to 429a73b, which is the classification this PR uses to trigger the fallback path. (role: adjacent contributor; confidence: medium; commits: 429a73be; files: src/clawsweeper.ts)
  • joshavant: Commit 6d78394 refined the user-visible Codex transport failure wording that this PR preserves and annotates during degraded reviews. (role: adjacent contributor; confidence: medium; commits: 6d78394e; files: src/clawsweeper.ts)
  • RomneyDa: The related merged retry-failed-reviews lane in feat: Retry failed ClawSweeper reviews after Codex infrastructure errors #260 covers the same failed-review recovery area, though by redispatch rather than in-process fallback. (role: adjacent feature owner; confidence: medium; commits: 23cf21c4; files: src/clawsweeper.ts, .github/workflows/sweep.yml, test/clawsweeper.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P1 Urgent regression or broken agent/channel workflow affecting real users now. merge-risk: 🚨 automation 🚨 Merging this PR could break CI, automerge, proof capture, label sync, or automation. labels Jun 12, 2026
@wangwllu

Copy link
Copy Markdown

Adding two pieces of supplementary evidence I gathered while investigating #282 from the affected PR's side, in case they're useful for the maintainer review of this fallback approach.

1. The failure was Codex-classified as capacity, not a generic transport flake

The clawsweeper review comment on openclaw/openclaw#92181 already shows:

failure reason: retryable codex transport failure (capacity)

This is the capacity branch in codexFailureReason() at src/clawsweeper.ts:6357 (matches rate limit reached / TPM / RPM / 429 / overloaded / temporarily unavailable / try again in Ns). Step-down to lower reasoning effort is the right shape for that class — capacity rejection is about per-request token weight, which is exactly what high → low reduces.

2. The rejection was PR-specific, not a fleet-wide capacity event

I checked the 14:35Z-15:00Z window for Review event item runs in clawsweeper. Aside from #92181, every other event-item run in that window completed successfully on the same backend:

Time (UTC) Run Result
14:48:25 Review event item openclaw/openclaw#92181 failed (this issue)
14:50:47 Review event item steipete/CodexBar#1442 success
14:53:31 Review event item steipete/CodexBar#1436 success
14:57:34 Review event item steipete/CodexBar#1433 success
14:57:41 Review event item openclaw/openclaw#91632 success (same target_repo)
14:58:52 Review event item openclaw/openclaw#92196 success (same target_repo)
14:59:49 Review event item steipete/CodexBar#1434 success

So the rejection wasn't a transient global capacity dip — steipete/CodexBar#1442 succeeded ~2 minutes later, and openclaw/openclaw#91632 (same target repo) succeeded ~9 minutes later. Each #92181 retry sent the same large prompt and got rejected at admission, which means a longer in-process backoff alone would not have helped; only reducing per-request token weight does. That's a positive corroboration that this PR's lower-effort fallback (rather than just longer backoff) is the right minimal fix.

Timing detail

For completeness, the three exhausted attempts on run 27355240940 died this fast:

attempt 1: 14:48:25 → 14:48:37   12s
attempt 2: 14:48:52 → 14:49:23   31s
attempt 3: 14:49:53 → 14:50:04   11s

Each died well below the 600s --codex-timeout-ms budget and before model thinking — so the original timeout/large-reasoning hypothesis is ruled out, and the budget-share-with-fallback approach in this PR fits cleanly within the unused remainder.

(Posted #282 evidence at #282 (comment 4692572669) before noticing this PR was already up — flagging here so maintainers don't have to cross-reference. Happy to help test once merged on openclaw/openclaw#92181.)

@yetval

yetval commented Jun 12, 2026

Copy link
Copy Markdown
Author

Context from #282: the OP narrowed the root cause to a capacity (TPM admission) rejection, not a network flake or timeout. The three retries each died in 11-31s, far under the 600s budget, and other reviews on the same backend in the same window succeeded, so it is a per-request token-cost problem specific to that PR, not a fleet-wide dip. Their conclusion: backoff alone cannot help, since each retry resends the same oversized request; only reducing per-request token cost does.

That validates the mechanism in this PR. The fallback here is exactly a per-request token-cost reduction (high -> low reasoning effort), and it already triggers on the classified capacity case (isRetryableCodexTransportError matches rate limit reached / TPM / RPM / 429 / overloaded).

One design choice worth confirming, since #282 also floats putting the downgrade in the retry-failed-reviews cron lane (#260):

  • This PR does the downgrade in-process, right after the retries exhaust, so the same run recovers and the contributor is unblocked immediately.
  • The cron-lane variant keeps first-pass at high and only the daily recovery lane pays the lower cost, so the PR stays blocked until the next tick (08:13Z / 20:13Z).

The in-process path is strictly faster for the contributor, and the two are not mutually exclusive (the cron lane can still re-attempt at lower effort if an in-process fallback also fails). If maintainers prefer to scope the downgrade to the recovery lane only, this is a small change to make. Flagging here to avoid a duplicate PR for the same fix.

@ooiuuii

ooiuuii commented Jun 13, 2026

Copy link
Copy Markdown

Adding another affected-case data point: I'm seeing the same failure mode on openclaw/openclaw#92251 and openclaw/openclaw#90610. The ClawSweeper re-review path hit transient_transport / Codex transport exhaustion, then the durable review/labels ended up as off-meta rather than reflecting the PR proof/status. So this does not seem isolated to #92181.

#283's bounded lower-effort fallback looks like the right shape to unblock these re-reviews without adding unbounded retry behavior.

Resolve conflicts in runCodex and its tests:
- Keep the bounded lower-effort fallback (runReviewPass closure + degraded
  annotation) on top of main's runCodexProcess invocation, --json output,
  and richer error classification (terminal vs retryable, errorCode/signal).
- CodexReviewError now carries retryable alongside errorCode/signal.
- Update the fallback-failure test stub to surface the rate-limit reason via
  the JSONL turn.failed channel that main's classifier now reads.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 automation 🚨 Merging this PR could break CI, automerge, proof capture, label sync, or automation. P1 Urgent regression or broken agent/channel workflow affecting real users now. proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants