Skip to content

fix(review): gate exact reviews by provider lease#281

Draft
hxy91819 wants to merge 1 commit into
mainfrom
codex/provider-lease-throttle
Draft

fix(review): gate exact reviews by provider lease#281
hxy91819 wants to merge 1 commit into
mainfrom
codex/provider-lease-throttle

Conversation

@hxy91819

@hxy91819 hxy91819 commented Jun 12, 2026

Copy link
Copy Markdown
Member

Why

ClawSweeper exact reviews can run from many GitHub workflow instances at once. Worker sizing only controls local/shard concurrency; it does not protect the shared Codex provider/API-key capacity across independent runs. The important behavior change in this PR is to move overload handling from "start Codex, hit provider capacity, then retry/fail" to "reserve a global provider slot before starting Codex."

That gives us a controlled queue/backpressure point inside ClawSweeper: while global provider capacity is full, an exact review waits for a lease and reports Waiting for Codex capacity instead of stampeding the provider and amplifying transient transport/capacity failures. If capacity remains full until the configured wait deadline, the workflow now writes a terminal Capacity timeout command status instead of leaving the command looking like it is still waiting. If the provider-lease state update itself fails, the run fails explicitly rather than pretending capacity is merely full.

The lease is renewed while the exact review is running. Review runtime is not bounded by the original lease TTL in practice. A long-running review keeps its slot; if renewal stops or the lease disappears, the review step fails closed instead of continuing after its slot has expired.

Summary

  • add a state-repo-backed Codex provider lease tool with weighted capacity, TTL pruning, and acquire/release/renew commands
  • gate exact event reviews on a global provider lease before invoking Codex, renew while Codex is running, then release before publishing state
  • fail the review step if lease renewal stops before the review completes, avoiding silent overbooking after TTL expiry
  • make current workflow capacity configuration authoritative over persisted lease state so emergency capacity changes take effect
  • distinguish provider capacity waits, terminal capacity timeouts, and provider-lease state update failures in command status
  • add a real GitHub Actions live proof mode that exercises the provider lease gate without calling Codex

Testing

  • pnpm run build:repair
  • node --test test/repair/provider-lease.test.ts
  • node --test --test-name-pattern "provider lease|exact event reviews|live proof mode" test/clawsweeper.test.ts
  • pnpm run check on Node v24.15.0
  • earlier on this branch: pnpm run format:check, node --test --test-name-pattern "event re-review status|provider capacity|provider lease" test/clawsweeper.test.ts, and autoreview --mode local clean after fixes

Observed local results on head 64760bcb6afa74b427c716f9aecb7eab87df73ea:

  • focused workflow/provider test passed: 3 tests, 3 pass.
  • full pnpm run check passed: 454 unit tests, 504 repair tests, coverage checks, and format check.

Code locations

Head: 64760bcb6afa74b427c716f9aecb7eab87df73ea

Automated proof

The local provider-lease proof is CI-safe and deterministic. It uses a temporary local bare git repository as the shared state remote, then clones it into multiple independent checkouts to model multiple workflow runners racing on the same state branch.

What the shared-state proof executes:

  • creates a temporary bare git origin and seeds its state branch
  • clones that origin into separate working copies to simulate independent workflow runners
  • acquires capacity=1 from the first checkout and verifies acquired=true with activeWeight=1
  • attempts a second acquisition from another checkout and verifies acquired=false, activeWeight=1, and a provider capacity full reason
  • releases the first lease and verifies a later acquisition can succeed
  • creates an unreleased short-TTL lease, waits past TTL, then verifies the next acquisition succeeds and prunes the stale lease
  • renews a running holder before its original TTL expires, waits past the original TTL, and verifies another checkout is still blocked by provider capacity full
  • waits past the renewed TTL after renewal stops, then verifies another checkout can acquire and the stale renewed holder is pruned
  • installs a rejecting pre-receive hook on the bare origin and verifies a failed shared-state push throws shared state push failed instead of returning a false successful acquisition

Live Actions proof

Run: https://github.com/openclaw/clawsweeper/actions/runs/27425442802

How it was triggered:

gh workflow run sweep.yml \
  --repo openclaw/clawsweeper \
  --ref codex/provider-lease-throttle \
  -f target_repo=openclaw/clawsweeper \
  -f additional_prompt='[provider-lease-live-proof]'

The marker [provider-lease-live-proof] skips normal review/planning/publish/shard jobs and runs only the proof jobs. This proof uses real GitHub Actions runners and the real openclaw/clawsweeper-state repository, but an isolated provider namespace codex-live-proof-27425442802; it does not invoke Codex.

Observed workflow metadata:

  • event: workflow_dispatch
  • head SHA: 64760bcb6afa74b427c716f9aecb7eab87df73ea
  • conclusion: success
  • skipped Codex/review jobs: Plan review candidates, Review shard ${{ matrix.shard }}, Review, comment, and apply event item, Publish review artifacts, Recover failed review shards, Apply close proposals, Audit state

Observed real concurrency:

  • Provider lease live proof concurrent waiter started at 2026-06-12T15:27:05Z
  • Provider lease live proof holder started at 2026-06-12T15:27:06Z
  • both jobs used the same real state repo provider file: provider-leases/codex-live-proof-27425442802.json

Observed holder/acquire/renew/release behavior:

  • holder acquired proof-holder-27425442802-1 at 2026-06-12T15:27:41Z with JSON acquired: true, activeWeight: 1, capacity: 1
  • holder renewed successfully four times at 15:27:52Z, 15:28:04Z, 15:28:16Z, and 15:28:28Z, each returning JSON renewed: true
  • holder stayed active for more than the original 30s TTL and then released at 15:28:29Z with JSON released: true

Observed concurrent capacity timeout:

  • waiter saw the holder lease in the real state repo at 2026-06-12T15:27:42Z
  • waiter then attempted to acquire with capacity=1, weight=1, wait-seconds=20
  • waiter completed at 2026-06-12T15:28:09Z with JSON acquired: false, activeWeight: 1, capacity: 1, and reason provider capacity full: active weight 1/1, requested 1

Observed release recovery:

  • after holder release, Provider lease live proof release check acquired proof-release-check-27425442802-1 at 2026-06-12T15:29:08Z with JSON acquired: true, activeWeight: 1, capacity: 1
  • it released that follow-up lease at 15:29:10Z with JSON released: true

Observed cancellation/TTL recovery:

  • TTL job acquired abandoned lease proof-abandoned-27425442802-1 at 2026-06-12T15:29:53Z with ttl-seconds=15, JSON acquired: true, activeWeight: 1, capacity: 1
  • the job intentionally did not release it, then waited 20s to simulate cancellation recovery
  • recovered acquire proof-recovered-27425442802-1 succeeded at 2026-06-12T15:30:15Z with JSON acquired: true, activeWeight: 1, capacity: 1
  • recovered lease released at 15:30:17Z with JSON released: true
  • isolated proof provider state was cleaned up at 15:30:19Z

What this proof does not claim:

  • it does not call the real Codex provider/API key; it proves ClawSweeper queues before invoking Codex when the configured shared lease capacity is full
  • it does not prove the provider can never return 429; it reduces stampedes by adding a shared pre-Codex capacity gate
  • if a workflow is cancelled after acquisition, recovery still depends on TTL expiry; the live proof demonstrates that TTL recovery works in the real Actions/state-repo path for the configured proof TTL

@hxy91819 hxy91819 requested a review from a team as a code owner June 12, 2026 03:31
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@clawsweeper

clawsweeper Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed June 12, 2026, 11:47 AM ET / 15:47 UTC.

Summary
Adds a state-repository-backed provider lease gate for exact reviews, with weighted capacity, renewal, terminal timeout handling, live Actions proof, and focused tests.

Reproducibility: yes. at source level: current main reduces background capacity when exact runs are active but has no shared gate before independent exact-review workflows invoke Codex. The live run reproduces the proposed contention model, although it intentionally does not reproduce an actual Codex-provider overload.

Review metrics: 3 noteworthy metrics.

  • Changed surface: 5 files, 1,587 additions, 20 deletions. A targeted admission-control problem receives a substantial new stateful workflow subsystem.
  • Lease coverage: 454 dedicated test lines. Focused tests cover weighted contention, expiry, renewal, rejected pushes, independent checkouts, release, and recovery.
  • Live proof phases: 4 successful lifecycle phases. The real Actions run demonstrates concurrent blocking, renewal beyond the original TTL, post-release acquisition, and abandoned-lease recovery.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster ✨ media proof bonus
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Choose whether exact-review admission belongs in the existing worker limiter or a separate provider lease.
  • If retaining leases, approve the state-repository outage policy and initial capacity, wait, renewal, polling, and TTL settings.

Risk before merge

  • [P1] Exact reviews would fail before or during Codex execution when the state repository cannot be checked out, pushed, or renewed, even if provider capacity is available.
  • [P1] A cancelled workflow cannot guarantee immediate release, so its lease temporarily consumes capacity until TTL expiry.
  • [P1] The branch introduces a second capacity-control model alongside worker accounting, increasing operational complexity and the chance of divergent tuning or observability.
  • [P1] Capacity, wait, polling, renewal, and TTL values become production policy; the isolated proof establishes mechanism behavior but not optimal settings under normal exact-review traffic.

Maintainer options:

  1. Integrate exact-review admission
    Extend the existing worker-capacity system to count, queue, and time out exact reviews if it can enforce the required cross-workflow provider limit.
  2. Approve the provider lease boundary
    Retain this implementation after maintainers explicitly accept its state-repository dependency, cancellation recovery window, and production lease settings.
  3. Continue the draft pause
    Observe the reduced worker defaults and gather exact-review contention data before adopting another production control plane.

Next step before merge

  • [P0] The draft needs an explicit maintainer decision on capacity architecture, state-repository outage behavior, and production lease settings rather than an automated code repair.

Security
Cleared: No concrete credential, permission, dependency-source, or third-party action regression was found; the branch reuses existing local token and checkout patterns.

Review details

Best possible solution:

Use one capacity model where practical: first determine whether the existing worker limiter can provide cross-workflow exact-review admission and terminal waiting semantics; retain the provider lease only if stronger provider-key protection requires it, with the state-outage policy and defaults explicitly approved.

Do we have a high-confidence way to reproduce the issue?

Yes at source level: current main reduces background capacity when exact runs are active but has no shared gate before independent exact-review workflows invoke Codex. The live run reproduces the proposed contention model, although it intentionally does not reproduce an actual Codex-provider overload.

Is this the best way to solve the issue?

Unclear. The lease is a coherent solution with strong lifecycle proof, but the repository-member author has appropriately paused the branch because extending the existing limiter may provide a simpler single source of capacity policy.

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against be65dd0db8a3.

Label changes

Label justifications:

  • P2: This is normal-priority reliability work for review automation with a bounded operational blast radius.
  • merge-risk: 🚨 automation: The branch changes exact-review admission, status reporting, lease renewal monitoring, cleanup, and proof-only workflow routing.
  • merge-risk: 🚨 availability: Merging makes exact reviews dependent on successful shared-state acquisition and renewal and can temporarily reduce capacity after cancellation.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (linked_artifact): The linked GitHub Actions run on the reviewed head convincingly demonstrates the after-change shared-state lease lifecycle on real runners, while clearly disclosing that it does not invoke Codex end to end.
  • proof: sufficient: Contributor real behavior proof is sufficient. The linked GitHub Actions run on the reviewed head convincingly demonstrates the after-change shared-state lease lifecycle on real runners, while clearly disclosing that it does not invoke Codex end to end.
Evidence reviewed

What I checked:

  • Current-main capacity gap: Current main counts active exact-item runs when reducing background capacity, but the exact event-review job still invokes Codex without a shared admission gate across independent workflow runs. (.github/workflows/sweep.yml:823, be65dd0db8a3)
  • Provider lease implementation: The branch serializes acquisition through state-branch synchronization and push, retries contention, distinguishes a full provider from a failed shared-state update, and provides renewal and release operations. (src/repair/provider-lease.ts:237, 64760bcb6afa)
  • Fail-closed workflow behavior: The exact-review job continuously renews its lease, monitors the renewal process, terminates the review if renewal stops, and releases an acquired lease before publishing state. (.github/workflows/sweep.yml:424, 64760bcb6afa)
  • Real Actions proof: The linked workflow_dispatch run succeeded on the reviewed head and ran concurrent holder/waiter, post-release acquisition, and abandoned-lease TTL recovery jobs against the real state repository. (64760bcb6afa)
  • Exact-review history: Git blame and show tie the original exact-review workflow to Dallin Romney, later model routing to Peter Steinberger, proof-token handling and current worker-default tuning to joshavant, and broader concurrency limiting to Vincent Koc. (.github/workflows/sweep.yml:346, be65dd0db8a3)
  • Repository remained clean: Read-only inspection left the target checkout unchanged. (be65dd0db8a3)

Likely related people:

  • Dallin Romney: The current exact event-review workflow and its direct Codex invocation appear to date to this contributor's merged workflow commit. (role: introduced behavior; confidence: high; commits: f93ca39cf3ec; files: .github/workflows/sweep.yml)
  • joshavant: Recent merged work added exact-review proof-token wiring and adjusted the worker defaults that currently mitigate overload symptoms. (role: recent area contributor; confidence: high; commits: 3fa24744ef26, be65dd0db8a3; files: .github/workflows/sweep.yml, config/automation-limits.json)
  • Peter Steinberger: Recent merged work changed model routing and viable-implementation behavior in the same exact-review workflow area. (role: adjacent owner; confidence: medium; commits: 9209fa90cb3c; files: .github/workflows/sweep.yml)
  • Vincent Koc: A recent merged automation change introduced broader Codex concurrency caps, making this contributor relevant to whether exact reviews should join the existing limiter. (role: capacity-system contributor; confidence: medium; commits: e8e8446c1b2d; files: .github/workflows/sweep.yml, src/repair/workflow-utils.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added the rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. label Jun 12, 2026
@hxy91819 hxy91819 force-pushed the codex/provider-lease-throttle branch from 9aa183f to 071ccec Compare June 12, 2026 03:58
@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 availability 🚨 Merging this PR could cause crashes, hangs, restart loops, stalls, or process outages. merge-risk: 🚨 automation 🚨 Merging this PR could break CI, automerge, proof capture, label sync, or automation. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels Jun 12, 2026
@hxy91819 hxy91819 force-pushed the codex/provider-lease-throttle branch from 071ccec to c562d12 Compare June 12, 2026 05:13
@hxy91819

Copy link
Copy Markdown
Member Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. labels Jun 12, 2026
@hxy91819 hxy91819 force-pushed the codex/provider-lease-throttle branch from c562d12 to 2e5a31d Compare June 12, 2026 06:25
@hxy91819

Copy link
Copy Markdown
Member Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 12, 2026
@hxy91819

Copy link
Copy Markdown
Member Author

@clawsweeper re-review

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels Jun 12, 2026
@hxy91819 hxy91819 force-pushed the codex/provider-lease-throttle branch from 2e5a31d to 14eed4e Compare June 12, 2026 08:10
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels Jun 12, 2026
@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 12, 2026
@hxy91819 hxy91819 force-pushed the codex/provider-lease-throttle branch from 14eed4e to fe01297 Compare June 12, 2026 10:06
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels Jun 12, 2026
@hxy91819 hxy91819 force-pushed the codex/provider-lease-throttle branch from fe01297 to aca476a Compare June 12, 2026 15:18
@hxy91819 hxy91819 force-pushed the codex/provider-lease-throttle branch from aca476a to 64760bc Compare June 12, 2026 15:26
@clawsweeper clawsweeper Bot added proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 12, 2026
@hxy91819 hxy91819 marked this pull request as draft June 12, 2026 15:42
@hxy91819

Copy link
Copy Markdown
Member Author

Putting this back to draft for now.

The immediate overload symptoms appear to have been mitigated by lowering the ClawSweeper worker/shard defaults on main. This PR still addresses a real remaining concern: exact re-review workflow runs are single-shard internally, but many independent exact-review workflow runs can still start Codex at the same time.

That said, this implementation adds a separate provider-lease system backed by clawsweeper-state, with acquire/release/renew, TTL recovery, push retry behavior, and live-proof workflow jobs. That is a meaningful amount of new operational complexity.

Before treating this as merge-ready, I think we should first decide whether exact re-review should be folded into the existing worker/capacity limiter instead of introducing a parallel provider-level limiter. If the existing limiter can count active exact-review runs as workers and make new exact-review runs wait or time out consistently, that would keep the capacity model easier to reason about. A provider lease may still be the right second-stage hardening if we need stronger cross-workflow/provider-key protection than GitHub Actions active-run accounting can provide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 automation 🚨 Merging this PR could break CI, automerge, proof capture, label sync, or automation. merge-risk: 🚨 availability 🚨 Merging this PR could cause crashes, hangs, restart loops, stalls, or process outages. P2 Normal priority bug or improvement with limited blast radius. proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant