Skip to content

Treat failed Codex reviews as infra-only#286

Open
SYU8384 wants to merge 1 commit into
openclaw:mainfrom
SYU8384:codex/failed-review-infra-labels
Open

Treat failed Codex reviews as infra-only#286
SYU8384 wants to merge 1 commit into
openclaw:mainfrom
SYU8384:codex/failed-review-infra-labels

Conversation

@SYU8384

@SYU8384 SYU8384 commented Jun 13, 2026

Copy link
Copy Markdown

Summary

  • Render failed Codex/ClawSweeper review artifacts as infrastructure failures instead of PR readiness verdicts.
  • Suppress stale PR readiness rating labels and rating justifications when review_status: failed.
  • Keep the existing real-behavior-proof blocker behavior for complete and legacy reports, while bypassing it for failed-review artifacts.

Verification

  • pnpm run build:all && node --test --test-name-pattern 'failed Codex review comments suppress PR readiness ratings|mock-only real behavior proof blocks repair markers|proof-blocked PR comments show proof cap while preserving patch quality|public PR review details justify derived rating label changes' test/clawsweeper.test.ts
  • pnpm run test:repair && pnpm run test:coverage:changed && pnpm run format:check
  • pnpm run check was also attempted. It stopped in test:unit on the existing local filesystem-mode assertion read-only checkout mode restores file modes and leaves git metadata writable, where stat(...).mode & 0777 returned 0777 instead of the expected 0555. The changed failed-review tests passed before that failure, and the independent downstream repair/coverage/format gates above passed.

Notes

This is the backend side of the #91210/#92499 re-review failure cleanup: exhausted Codex transport failures should remain infra-only and should not leave patch-quality or readiness labels behind.

@clawsweeper

clawsweeper Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 13, 2026, 4:44 PM ET / 20:44 UTC.

Summary
The branch changes ClawSweeper PR review rendering, label derivation, and tests so review_status: failed reports appear as infrastructure failures without PR readiness ratings or proof blockers.

Reproducibility: Source-reproducible. Current main unconditionally renders PR readiness and rating label justifications for pull-request reports, while failed review reports still carry PR rating fields; I did not execute tests because this review is read-only.

Review metrics: 3 noteworthy metrics.

  • Files changed: 2 files, +125/-16. The branch is scoped to ClawSweeper renderer/label logic and its unit coverage.
  • Regression coverage: 1 failed-review test added. The new test exercises the infra-only comment path and stale rating-label removal that the PR is meant to change.
  • Validation status: targeted gates claimed; full check stopped. The PR body reports focused tests passing but no complete pnpm run check result yet.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🐚 platinum hermit
Result: blocked until real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Add redacted terminal output, copied live output, screenshot, recording, or linked artifact showing a failed-review artifact rendered as infrastructure-only with stale readiness labels suppressed.

Proof guidance:

  • [P1] Needs real behavior proof before merge: The PR body lists test commands but does not include after-fix output, screenshot, recording, log, or linked artifact showing the failed-review comment or label behavior in a real setup. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • [P1] The PR changes ClawSweeper review-comment and label automation; a bad edge case could leave failed review artifacts with stale readiness state or remove a useful label incorrectly.
  • [P1] The PR body reports targeted tests passing, but no real behavior proof shows a failed review artifact being rendered or synced through the actual path.
  • [P1] The body says pnpm run check was attempted but stopped in test:unit on an existing filesystem-mode assertion, so maintainers still need a complete green gate or explicit acceptance of that known failure.

Maintainer options:

  1. Add failed-review render proof (recommended)
    Ask for redacted terminal output, a screenshot, or a linked artifact showing a failed PR report rendering as infra-only and removing the stale rating label.
  2. Accept test-only coverage
    Maintainers can intentionally merge with the targeted regression tests if they decide the renderer and label paths are adequately covered without live proof.
  3. Pause if label policy is unsettled
    If failed-review artifacts should also change non-rating proof or status labels, pause this PR for an explicit label-policy decision before merge.

Next step before merge

  • [P1] This PR should stay on the human review path until the contributor adds real behavior proof and maintainers accept the incomplete full-check status; there is no narrow automated code repair to request.

Security
Cleared: The diff changes TypeScript rendering and tests only; it does not add dependencies, workflows, secrets handling, downloaded code, or supply-chain execution paths.

Review details

Best possible solution:

Land the narrow failed-review infra-only handling after real rendered-comment or label-sync proof is added and the normal merge gates are acceptable, while keeping complete-review proof blocking unchanged.

Do we have a high-confidence way to reproduce the issue?

Source-reproducible. Current main unconditionally renders PR readiness and rating label justifications for pull-request reports, while failed review reports still carry PR rating fields; I did not execute tests because this review is read-only.

Is this the best way to solve the issue?

Yes. The PR uses narrow review_status: failed guards at the rendering, proof-blocking, and rating-label points, which matches the intended infra-only behavior without changing complete-review handling.

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against 0c3e8876ae04.

Label changes

Label changes:

  • add P2: This is a normal-priority automation correctness fix for failed ClawSweeper review artifacts with limited blast radius.
  • add merge-risk: 🚨 automation: The diff changes review comment rendering, proof-blocking behavior, and ClawSweeper-owned label derivation for failed review artifacts.
  • add rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🐚 platinum hermit.
  • add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body lists test commands but does not include after-fix output, screenshot, recording, log, or linked artifact showing the failed-review comment or label behavior in a real setup. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Label justifications:

  • P2: This is a normal-priority automation correctness fix for failed ClawSweeper review artifacts with limited blast radius.
  • merge-risk: 🚨 automation: The diff changes review comment rendering, proof-blocking behavior, and ClawSweeper-owned label derivation for failed review artifacts.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🐚 platinum hermit.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body lists test commands but does not include after-fix output, screenshot, recording, log, or linked artifact showing the failed-review comment or label behavior in a real setup. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
Evidence reviewed

What I checked:

  • Repository policy read: AGENTS.md was read fully; its conservative automation guidance and source/test locations were applied to this read-only review. (AGENTS.md:1, 0c3e8876ae04)
  • Current main readiness rendering: Current main unconditionally renders publicMergeReadinessBlock(prRating, realBehaviorProof) for pull-request reports, which is the source behavior this PR changes for failed reviews. (src/clawsweeper.ts:13549, 0c3e8876ae04)
  • Current main label derivation: Current main enters PR label justification logic for every pull request and adds the rating label from reportPrRating(markdown) without a failed-review guard. (src/clawsweeper.ts:12736, 0c3e8876ae04)
  • PR implementation: The PR adds publicFailedReviewReadinessBlock and uses it for failed reports, while also bypassing real-behavior-proof merge blocking when review_status is failed. (src/clawsweeper.ts:8096, 7b8cd3716899)
  • PR label handling: The PR filters PR rating labels for failed reports and skips derived rating justifications when review_status is failed. (src/clawsweeper.ts:12541, 7b8cd3716899)
  • PR regression test: The added test builds a failed PR report, verifies the infra-only readiness wording, and checks that a stale rating label is removed without a rating justification. (test/clawsweeper.test.ts:4333, 7b8cd3716899)

Likely related people:

  • Peter Steinberger: Blame and git log -S tie the PR readiness, proof, label-rendering, and failed-Codex handling surfaces to recent commits by this author. (role: feature-history owner; confidence: high; commits: 7ad3bd8f972e, 499e86783b2a; files: src/clawsweeper.ts, test/clawsweeper.test.ts)
  • brokemac79: The latest main commit before this review changed adjacent ClawSweeper label-sync behavior and tests in the same central files. (role: recent adjacent contributor; confidence: medium; commits: 0c3e8876ae04; files: src/clawsweeper.ts, test/clawsweeper.test.ts, docs/work-lane.md)
  • joshavant: Recent history includes a Codex failure-reason clarification commit, which is adjacent to the failure reason text this PR now surfaces in readiness rendering. (role: adjacent failed-review contributor; confidence: medium; commits: 6d78394e75bc; files: src/clawsweeper.ts, test/clawsweeper.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 automation 🚨 Merging this PR could break CI, automerge, proof capture, label sync, or automation. labels Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 automation 🚨 Merging this PR could break CI, automerge, proof capture, label sync, or automation. P2 Normal priority bug or improvement with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant