Add approval-gated agent failure diagnosis#158
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| const diagnosisFile = join(artifactDir, "diagnosis.json"); | ||
| const pendingBodyFile = join(artifactDir, "pending-report.md"); | ||
| writeFileSync(diagnosisFile, `${JSON.stringify(report.diagnosis, null, 2)}\n`, "utf8"); | ||
| if (mode === "approval" || mode === "true") { |
There was a problem hiding this comment.
BLOCKING: approval mode writes a pending report artifact, but nothing in this PR consumes that artifact after human approval. Issue #156 and the linked PR #117 plan call for failure -> self-diagnosis -> pending report -> human approval -> central Discussion. Please add an approval publish route/CLI/workflow that re-checks requester authorization and creates/comments the central Discussion, or explicitly scope the PR/docs to a manual draft artifact instead of closing the approval-gated flow.
| } | ||
|
|
||
| if (matchesAny(evidence, [ | ||
| /typeerror/, |
There was a problem hiding this comment.
WARNING: This flat match treats generic TypeError, ReferenceError, and SyntaxError as high-confidence Sepo product bugs even without .agent/ stack evidence. In explicit true mode, a user-code TypeError from a script or test could become eligible for central auto-publish. Require an .agent frame for high confidence, or classify generic JavaScript errors without agent stack evidence at a lower confidence.
| assert.match(publication.url, /#comment-1$/); | ||
| assert.equal(calls.length, 2); | ||
| assert.match(calls[1]?.query || "", /addDiscussionComment/); | ||
| }); |
There was a problem hiding this comment.
WARNING: Please add a regression test that publishFailureReport makes zero GraphQL calls for approval mode. Issue #156 called out this guard, and without it a future refactor could accidentally turn pending approval into direct Discussion publication.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Sepo pushed fixes for this PR. Branch:
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| try { | ||
| const requestArgs = parseRequestArgs(env("REQUEST_TEXT")); | ||
| requestArgs.runId = env("FAILURE_REPORT_RUN_ID", requestArgs.runId); | ||
| requestArgs.runAttempt = env("FAILURE_REPORT_RUN_ATTEMPT", requestArgs.runAttempt); |
There was a problem hiding this comment.
WARNING: For router-triggered /publish-failure-report comments, FAILURE_REPORT_RUN_ATTEMPT is always 1 because the reusable workflow input defaults to 1, so this overwrites any run_attempt=2 parsed from REQUEST_TEXT. Please only override parsed args when the workflow input was explicitly supplied, or pass an empty workflow_call default and apply the 1 fallback after parsing.
| REQUESTED_BY: ${{ inputs.requested_by || github.actor }} | ||
| REQUESTER_ASSOCIATION: ${{ inputs.requester_association || '' }} | ||
| REQUEST_TEXT: ${{ inputs.request_text || '' }} | ||
| run: node .agent/dist/cli/publish-approved-failure-report.js |
There was a problem hiding this comment.
WARNING: The publish step writes the result only to the Actions step summary, so a requester who used /publish-failure-report gets a receipt reaction but no GitHub-visible completion status, Discussion URL, or error reason. Please add a compact response comment after this step, or otherwise route the publication status back to the triggering thread.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Sepo pushed fixes for this PR. Branch:
|
This comment has been minimized.
This comment has been minimized.
|
The current PR head has resolved the earlier approval-route and
Notes
Findings
Final Rubric VerdictPARTIAL |
| } | ||
|
|
||
| if ( | ||
| hasAgentStackEvidence || |
There was a problem hiding this comment.
WARNING: This product-bug branch runs before provider/runtime patterns, so provider failures that include an .agent stack frame, such as OpenAI 429/rate-limit errors from the adapter, become high-confidence product-bug candidates and can auto-publish in explicit true mode. Please classify provider/runtime markers before generic agent-stack evidence, or require an agent-internal exception shape in addition to the stack frame.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Sepo pushed fixes for this PR. Branch:
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| ); | ||
| result = result.replace(/\bgithub_pat_[A-Za-z0-9_]{20,}\b/g, "[REDACTED_GITHUB_TOKEN]"); | ||
| result = result.replace(/\bgh[opsru]_[A-Za-z0-9_]{20,}\b/g, "[REDACTED_GITHUB_TOKEN]"); | ||
| result = result.replace(/\bsk-[A-Za-z0-9_-]{20,}\b/g, "[REDACTED_PROVIDER_KEY]"); |
There was a problem hiding this comment.
WARNING: The sanitizer does not cover common CI credential shapes such as AWS access key IDs or JWTs. Because diagnosis artifacts are uploaded and may later be published, please redact at least AKIA-style keys and JWT-shaped tokens here, with focused tests.
| client?: GraphQLClient, | ||
| ): FailurePublication { | ||
| const { owner, repo } = parseRepoSlug(diagnosis.proposedDiscussion.repository); | ||
| const existing = findRepositoryDiscussionByTitle( |
There was a problem hiding this comment.
WARNING: This lookup is keyed by the generated title, not the stable fingerprint marker. The fingerprint normalizes dynamic values, but the title keeps the raw headline, so the same failure fingerprint can still create duplicate Discussions when the headline changes. Please search by the hidden fingerprint marker or otherwise match on fingerprint rather than exact title.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Sepo pushed fixes for this PR. Branch:
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| client, | ||
| ); | ||
| if (existing) { | ||
| const occurrenceMarker = failureReportOccurrenceMarker( |
There was a problem hiding this comment.
WARNING: This occurrence marker only keys on fingerprint + runId. GitHub reruns share the same run_id but have distinct run_attempt values and distinct diagnosis artifacts, so publishing attempt 2 after attempt 1 will be treated as already recorded. Please include diagnosis.source.runAttempt || "1" in the occurrence marker and add a regression test for two attempts of the same run.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Sepo pushed fixes for this PR. Branch:
|
|
Sepo is dispatching follow-up automation.
Reason: agent planner selected review: fix-pr completed successfully for open PR #158 after addressing the latest idempotency and redaction findings, so one automated re-review is warranted before stopping. |
Rubrics Review
Notes
Findings
Final Rubric VerdictPARTIAL |
AI Review Synthesis
Summary of PR/Issue
ReviewThe PR is directionally aligned with the safer diagnosis-first model, but it does not yet complete the accepted approval-gated workflow. The main disagreement is that the rubrics review considered pending artifacts sufficient for the core UX, while the code review and linked #156/#117 context treat the missing approved publish path as blocking.
Progress
Issue DetailsApproval publish path is missingCause: Candidate solutions: Add an approval publish path, for example a route or workflow dispatch that accepts a fingerprint/artifact reference, validates authorization, and calls the existing Discussion helper. If maintainers intentionally want manual copy-paste only, re-scope the docs and PR closure so it no longer claims the full approval-gated central-report flow. Comments: This is the main reviewer disagreement. Based on #156 and the linked PR #117 plan, the synthesis treats the approved publish step as required. Pending report destination is not validated in previewCause: Candidate solutions: Validate the Product-bug classifier over-promotes generic JavaScript errorsCause: Candidate solutions: Require Approval publish guard lacks regression coverageCause: Candidate solutions: Add a unit test that builds an Optional hardening remainsCause: Lower-priority reviewer notes include broad Candidate solutions: Tighten provider patterns to require error context, dedupe repeat comments by hidden occurrence marker, revisit defaults for external deployments, and add classifier coverage for remaining categories. Comments: These are useful follow-ups but are secondary to completing the approval flow and preview/test guards. Recommended Next Step
Final VerdictNEEDS_REWORK Action Items
|
|
@sepo-agent /answer ugh ok this turned into a gigantic pr -> can you summarize how this evolves into this current situation and what are the core function components here? |
|
This got big because the scope moved from “auto-post failures” to a safer end-to-end reporting system. The evolution:
Core functional components:
Current behavior in one line: failed agent runs are diagnosed locally by default; public repos get a pending sanitized report for approval, private repos stay local unless configured, and central posting only happens after explicit approval or explicit |
Summary
run-agent-taskto diagnose failed runs before rethrowing the original exit code, upload diagnosis artifacts, and supportfalse,diagnose,approval, and policy-gatedtruemodes.Tests
npm --prefix .agent test.github/**/*.ymland.agent/action-templates/*.ymlgit diff --checkCloses #156