Skip to content

Add approval-gated agent failure diagnosis#158

Draft
sepo-agent-app[bot] wants to merge 6 commits into
mainfrom
agent/implement-issue-156/codex-25363409055
Draft

Add approval-gated agent failure diagnosis#158
sepo-agent-app[bot] wants to merge 6 commits into
mainfrom
agent/implement-issue-156/codex-25363409055

Conversation

@sepo-agent-app

@sepo-agent-app sepo-agent-app Bot commented May 5, 2026

Copy link
Copy Markdown

Summary

  • Add a typed failure-report helper and CLI that classify failed agent runs, redact evidence, generate fingerprints, and prepare local JSON plus pending Discussion drafts.
  • Wire run-agent-task to diagnose failed runs before rethrowing the original exit code, upload diagnosis artifacts, and support false, diagnose, approval, and policy-gated true modes.
  • Forward failure-report configuration through bundled workflows and update docs for the new defaults and optional publish token.

Tests

  • npm --prefix .agent test
  • YAML parse for .github/**/*.yml and .agent/action-templates/*.yml
  • CLI smoke check for diagnosis artifacts and redaction
  • git diff --check

Closes #156

@sepo-agent-app

This comment has been minimized.

@sepo-agent-app

This comment has been minimized.

const diagnosisFile = join(artifactDir, "diagnosis.json");
const pendingBodyFile = join(artifactDir, "pending-report.md");
writeFileSync(diagnosisFile, `${JSON.stringify(report.diagnosis, null, 2)}\n`, "utf8");
if (mode === "approval" || mode === "true") {

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BLOCKING: approval mode writes a pending report artifact, but nothing in this PR consumes that artifact after human approval. Issue #156 and the linked PR #117 plan call for failure -> self-diagnosis -> pending report -> human approval -> central Discussion. Please add an approval publish route/CLI/workflow that re-checks requester authorization and creates/comments the central Discussion, or explicitly scope the PR/docs to a manual draft artifact instead of closing the approval-gated flow.

Comment thread .agent/src/failure-report.ts Outdated
}

if (matchesAny(evidence, [
/typeerror/,

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: This flat match treats generic TypeError, ReferenceError, and SyntaxError as high-confidence Sepo product bugs even without .agent/ stack evidence. In explicit true mode, a user-code TypeError from a script or test could become eligible for central auto-publish. Require an .agent frame for high confidence, or classify generic JavaScript errors without agent stack evidence at a lower confidence.

assert.match(publication.url, /#comment-1$/);
assert.equal(calls.length, 2);
assert.match(calls[1]?.query || "", /addDiscussionComment/);
});

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Please add a regression test that publishFailureReport makes zero GraphQL calls for approval mode. Issue #156 called out this guard, and without it a future refactor could accidentally turn pending approval into direct Discussion publication.

@sepo-agent-app

This comment has been minimized.

@sepo-agent-app

This comment has been minimized.

@sepo-agent-app

sepo-agent-app Bot commented May 5, 2026

Copy link
Copy Markdown
Author

Sepo pushed fixes for this PR. Branch: agent/implement-issue-156/codex-25363409055. Requested by @sepo-agent.

  • Added /publish-failure-report plus agent-publish-failure-report.yml and a CLI that consumes pending diagnosis artifacts, rechecks AGENT_ACCESS_POLICY, and publishes/comments the central Discussion.
  • Added pending destination publishability validation/warnings and separated approved publishing from explicit auto-publish.
  • Tightened generic JS error classification and added regression coverage, including approval-mode zero GraphQL calls.
  • Updated docs for the new approval publish path and token behavior.
  • Checks: npm --prefix .agent test; git diff --check.

@sepo-agent-app

This comment has been minimized.

@sepo-agent-app

This comment has been minimized.

try {
const requestArgs = parseRequestArgs(env("REQUEST_TEXT"));
requestArgs.runId = env("FAILURE_REPORT_RUN_ID", requestArgs.runId);
requestArgs.runAttempt = env("FAILURE_REPORT_RUN_ATTEMPT", requestArgs.runAttempt);

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: For router-triggered /publish-failure-report comments, FAILURE_REPORT_RUN_ATTEMPT is always 1 because the reusable workflow input defaults to 1, so this overwrites any run_attempt=2 parsed from REQUEST_TEXT. Please only override parsed args when the workflow input was explicitly supplied, or pass an empty workflow_call default and apply the 1 fallback after parsing.

REQUESTED_BY: ${{ inputs.requested_by || github.actor }}
REQUESTER_ASSOCIATION: ${{ inputs.requester_association || '' }}
REQUEST_TEXT: ${{ inputs.request_text || '' }}
run: node .agent/dist/cli/publish-approved-failure-report.js

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: The publish step writes the result only to the Actions step summary, so a requester who used /publish-failure-report gets a receipt reaction but no GitHub-visible completion status, Discussion URL, or error reason. Please add a compact response comment after this step, or otherwise route the publication status back to the triggering thread.

@sepo-agent-app

This comment has been minimized.

@sepo-agent-app

This comment has been minimized.

@sepo-agent-app

sepo-agent-app Bot commented May 5, 2026

Copy link
Copy Markdown
Author

Sepo pushed fixes for this PR. Branch: agent/implement-issue-156/codex-25363409055. Requested by @sepo-agent.

  • Preserved routed /publish-failure-report run_attempt=... values by leaving reusable workflow run_attempt blank unless explicitly supplied.
  • Added routed completion feedback: the publish CLI now emits a response body for success/failure, and the workflow posts it back to the original thread with status plus Discussion URL or reason.
  • Updated regression coverage and docs for the routed publish behavior.
  • Checks passed: npm --prefix .agent run build, node --test .agent/dist/__tests__/envelope.test.js, git diff --check.

@sepo-agent-app

This comment has been minimized.

@sepo-agent-app

sepo-agent-app Bot commented May 5, 2026

Copy link
Copy Markdown
Author

The current PR head has resolved the earlier approval-route and run_attempt review items. The remaining rubric-specific concerns are narrower: repeated publication comments are not deduped, and diagnosis/artifact persistence is still best-effort without a dedicated visible warning if that persistence fails.## Rubrics Review

Total Score Verdict Rubrics Scored
93 PARTIAL 13
Dimension Rubric Result Score Evidence
coding_workflow / generic Validate delegated route authorization pass 8/8 /publish-failure-report is routed through agent-router.yml, passes requester association, and the CLI rechecks isAssociationAllowedForRoute before publishing.
coding_workflow / generic Keep docs in sync pass 7/7 Docs cover failure modes, /publish-failure-report, token behavior, pending previews, and routed completion status.
coding_workflow / generic Validate preview contracts pass 7/7 validateFailureReportDestination validates pending report destination and tests cover unpublishable previews.
coding_workflow / generic Make user-facing automation idempotent partial 5/7 Existing Discussions are reused by title/fingerprint, but repeat publication always appends another occurrence comment.
coding_workflow / generic Reuse existing code pass 8/8 Reuses shared run-agent-task, access-policy helpers, raw stdout/stderr outputs, and Discussion helpers.
communication / generic Keep status comments compact pass 6/6 Publication responses and diagnosis summaries use compact field tables instead of raw logs.
coding_workflow / generic Make surgical changes pass 7/8 Broad workflow wiring is necessary for shared failure reporting; changes remain scoped to diagnosis/reporting, docs, and tests.
coding_workflow / generic Prefer explicit structured inputs pass 7/7 Adds separate typed inputs for mode, repository, category, token, run ID, attempt, artifact, and fingerprint.
coding_workflow / generic Read docs and linked context first pass 6/6 PR #158 follows issue #156’s accepted approval-gated diagnosis plan rather than the older PR #117 direct-posting UX.
coding_workflow / generic Surface stateful failures partial 5/7 Publish failures are surfaced, but diagnosis and artifact upload in run-agent-task remain continue-on-error without a dedicated visible persistence-failure status.
coding_style / generic Prefer simple implementations pass 6/7 The resolver, classifier, validation, and publishing flow are deterministic; complexity is proportional to the new route surface.
coding_workflow / generic No per-file inventory tables in docs pass 6/6 Documentation describes behavior and configuration, not a stale file inventory.
coding_workflow / generic Comment on accepted workflow stops pass 6/6 Routed publish requests post a compact completion/failure response with status, Discussion URL, or reason.

Notes

Findings

  • WARNING: idempotent-user-facing-automation - publishFailureReportToDiscussion finds an existing Discussion, then always calls addDiscussionComment with a hidden occurrence marker. Re-running the same approved artifact can duplicate the same repeat occurrence comment.
  • INFO: surface-stateful-failures - run-agent-task keeps diagnosis and artifact upload best-effort. That preserves the original failure, but artifact persistence failure can leave no publishable approval artifact without a focused visible warning.

Final Rubric Verdict

PARTIAL

@sepo-agent-app sepo-agent-app Bot added priority/p1 Project management: high priority effort/high Project management: high effort labels May 5, 2026
Comment thread .agent/src/failure-report.ts Outdated
}

if (
hasAgentStackEvidence ||

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: This product-bug branch runs before provider/runtime patterns, so provider failures that include an .agent stack frame, such as OpenAI 429/rate-limit errors from the adapter, become high-confidence product-bug candidates and can auto-publish in explicit true mode. Please classify provider/runtime markers before generic agent-stack evidence, or require an agent-internal exception shape in addition to the stack frame.

@sepo-agent-app

This comment has been minimized.

@sepo-agent-app

This comment has been minimized.

@sepo-agent-app

sepo-agent-app Bot commented May 5, 2026

Copy link
Copy Markdown
Author

Sepo pushed fixes for this PR. Branch: agent/implement-issue-156/codex-25363409055. Requested by @sepo-agent.

  • Reordered failure classification so provider/runtime markers win before generic .agent stack-frame product-bug classification.
  • Added regression coverage for an OpenAI 429/rate-limit adapter stack staying provider_or_runtime with low product-bug likelihood.
  • Checks passed: npm --prefix .agent run build, node --test .agent/dist/__tests__/failure-report.test.js, git diff --check.

@sepo-agent-app

This comment has been minimized.

@sepo-agent-app

This comment has been minimized.

);
result = result.replace(/\bgithub_pat_[A-Za-z0-9_]{20,}\b/g, "[REDACTED_GITHUB_TOKEN]");
result = result.replace(/\bgh[opsru]_[A-Za-z0-9_]{20,}\b/g, "[REDACTED_GITHUB_TOKEN]");
result = result.replace(/\bsk-[A-Za-z0-9_-]{20,}\b/g, "[REDACTED_PROVIDER_KEY]");

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: The sanitizer does not cover common CI credential shapes such as AWS access key IDs or JWTs. Because diagnosis artifacts are uploaded and may later be published, please redact at least AKIA-style keys and JWT-shaped tokens here, with focused tests.

Comment thread .agent/src/failure-report.ts Outdated
client?: GraphQLClient,
): FailurePublication {
const { owner, repo } = parseRepoSlug(diagnosis.proposedDiscussion.repository);
const existing = findRepositoryDiscussionByTitle(

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: This lookup is keyed by the generated title, not the stable fingerprint marker. The fingerprint normalizes dynamic values, but the title keeps the raw headline, so the same failure fingerprint can still create duplicate Discussions when the headline changes. Please search by the hidden fingerprint marker or otherwise match on fingerprint rather than exact title.

@sepo-agent-app

This comment has been minimized.

@sepo-agent-app

This comment has been minimized.

@sepo-agent-app

sepo-agent-app Bot commented May 5, 2026

Copy link
Copy Markdown
Author

Sepo pushed fixes for this PR. Branch: agent/implement-issue-156/codex-25363409055. Requested by @sepo-agent.

  • Reused central failure-report Discussions by the hidden fingerprint marker instead of generated titles.
  • Deduped repeat occurrence comments by their hidden occurrence marker and returned the existing comment URL when rerun.
  • Expanded failure-evidence redaction for AWS access key IDs and JWT-shaped tokens.
  • Added regression coverage for fingerprint reuse, occurrence dedupe, and new redactions; updated docs for the idempotency behavior.
  • Checks passed: npm --prefix .agent run build, node --test .agent/dist/__tests__/failure-report.test.js, git diff --check.

@sepo-agent-app

This comment has been minimized.

@sepo-agent-app

This comment has been minimized.

client,
);
if (existing) {
const occurrenceMarker = failureReportOccurrenceMarker(

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: This occurrence marker only keys on fingerprint + runId. GitHub reruns share the same run_id but have distinct run_attempt values and distinct diagnosis artifacts, so publishing attempt 2 after attempt 1 will be treated as already recorded. Please include diagnosis.source.runAttempt || "1" in the occurrence marker and add a regression test for two attempts of the same run.

@sepo-agent-app

This comment has been minimized.

@sepo-agent-app

This comment has been minimized.

@sepo-agent-app

sepo-agent-app Bot commented May 5, 2026

Copy link
Copy Markdown
Author

Sepo pushed fixes for this PR. Branch: agent/implement-issue-156/codex-25363409055. Requested by @sepo-agent.

  • Included run_attempt in failure-report occurrence markers and added regression coverage for two attempts of the same run.
  • Added a non-failing persistence status/warning path for diagnosis or artifact upload failures while preserving the original agent exit code.
  • Updated failure-report and internal action docs for attempt-aware dedupe and persistence warnings.
  • Checks passed: npm --prefix .agent run build, node --test .agent/dist/__tests__/failure-report.test.js, node --test .agent/dist/__tests__/envelope.test.js, git diff --check.

@sepo-agent-app

sepo-agent-app Bot commented May 5, 2026

Copy link
Copy Markdown
Author

Sepo is dispatching follow-up automation.

Source Next Target Round Status
fix-pr review PR #158 13 / 40 Dispatched

Reason: agent planner selected review: fix-pr completed successfully for open PR #158 after addressing the latest idempotency and redaction findings, so one automated re-review is warranted before stopping.

@sepo-agent-app

sepo-agent-app Bot commented May 5, 2026

Copy link
Copy Markdown
Author

Rubrics Review

Total Score Verdict Rubrics Scored
97 PARTIAL 13
Dimension Rubric Result Score Evidence
coding_workflow / generic Validate delegated route authorization pass 8/8 /publish-failure-report is routed through agent-router.yml, and the publish CLI rechecks isAssociationAllowedForRoute(policy, ROUTE, association, isPublicRepo) before publishing:
function assertRequesterAuthorized(requester: string, associationInput: string): string {
coding_workflow / generic Keep docs in sync pass 7/7 Docs were updated for diagnosis modes, approval publishing, persistence warnings, idempotent reuse, and token behavior:
Failed agent runs go through local diagnosis before the shared action rethrows
coding_workflow / generic Validate preview contracts pass 7/7 Pending destinations are validated and unpublishable previews are surfaced in output; tests cover malformed repository warnings:
test("buildFailureReport surfaces unpublishable pending destination warnings", () => {
coding_workflow / generic Make user-facing automation idempotent partial 6/7 Discussions are reused by hidden fingerprint marker and repeat comments are deduped by run attempt, but Discussion lookup scans only the latest 50 Discussions:
export function findRepositoryDiscussionByBodyMarker(
coding_workflow / generic Reuse existing code pass 8/8 Reuses shared run-agent-task, access-policy helpers, captured stdout/stderr, and Discussion GraphQL helpers rather than embedding API logic in workflow glue.
communication / generic Keep status comments compact pass 6/6 The publish route writes compact completion/status output and uses post-response.js for thread-visible replies:
- name: Post failure report publication status
coding_workflow / generic Make surgical changes pass 7/8 The diff is broad because shared workflows forward new failure-report inputs, but the changes stay scoped to failure diagnosis/reporting, docs, and targeted tests.
coding_style / generic Prefer simple implementations pass 6/7 Mode resolution, classification, destination validation, and publication paths are deterministic and direct; added complexity is proportional to the approval route.
coding_workflow / generic Prefer explicit structured inputs pass 7/7 Adds separate mode, repository, category, token, run ID, run attempt, artifact, and fingerprint inputs instead of overloading a single string:
coding_workflow / generic Read docs and linked context first pass 6/6 PR #158 closes #156 and implements the accepted plan from that issue: local diagnosis, pending approval artifact, approval publish path, and explicit auto-publish.
coding_workflow / generic Surface stateful failures pass 7/7 Diagnosis/artifact persistence failures now emit a dedicated warning/status while preserving the original agent exit code:
coding_workflow / generic No per-file inventory tables in docs pass 6/6 Documentation describes behavior and configuration; it does not add a per-file inventory table.
coding_workflow / generic Comment on accepted workflow stops pass 6/6 Routed publish requests post success/failure status back to the trigger surface when response context is available:
- name: Post failure report publication status

Notes

  • The current PR head satisfies the core rubric-sensitive design from Rethink bug resolution, automatic diagnosis, and automatic reporting #156: public repos default to approval-gated diagnosis, private repos default to local diagnosis, approved publishing rechecks access policy, and reporting is deduped by fingerprint plus run attempt.
  • Smallest useful follow-up: consider paginating findRepositoryDiscussionByBodyMarker so older matching failure-report Discussions are still reused after more than 50 newer Discussions.

Findings

  • INFO: idempotent-user-facing-automation - fingerprint-based reuse is implemented, but findRepositoryDiscussionByBodyMarker only scans discussions(first: 50). High-volume intake repositories can still create a duplicate central Discussion if the matching fingerprint is older than that first page.

Final Rubric Verdict

PARTIAL

@sepo-agent-app

sepo-agent-app Bot commented May 5, 2026

Copy link
Copy Markdown
Author

AI Review Synthesis

Dual-agent review by Claude and Codex.
Requested by @lolipopshock.

Summary of PR/Issue

  • PR Add approval-gated agent failure diagnosis #158 replaces the older blind failure-reporting approach with local failed-agent-run diagnosis.
  • It adds typed failure classification, redaction, fingerprinting, JSON and pending-report artifacts, workflow wiring, docs, and four modes: false, diagnose, approval, and explicit policy-gated true.
  • The PR closes Rethink bug resolution, automatic diagnosis, and automatic reporting #156, whose accepted context points to the flow failure -> self-diagnosis -> pending report -> human approval -> central Discussion.
  • The implementation covers the diagnosis and pending-draft pieces, but reviewers disagree on whether the human-approved publish path is required in this PR.

Review

The PR is directionally aligned with the safer diagnosis-first model, but it does not yet complete the accepted approval-gated workflow. The main disagreement is that the rubrics review considered pending artifacts sufficient for the core UX, while the code review and linked #156/#117 context treat the missing approved publish path as blocking.

Issue Severity Description
Approval publish path is missing BLOCKING approval mode writes a pending artifact, but no route, CLI, or workflow consumes it after human approval and publishes/comments to the central Discussion.
Pending report destination is not validated in preview WARNING approval mode can emit a clean pending artifact with an invalid report repository/category, while publish validation only happens later in explicit true mode.
Product-bug classifier over-promotes generic JavaScript errors WARNING Generic TypeError, ReferenceError, and SyntaxError logs become high-confidence product-bug candidates even without .agent stack evidence.
Approval publish guard lacks regression coverage WARNING Tests cover explicit true publishing, but not that approval mode makes zero GraphQL calls.
Optional hardening remains INFO Reviewers noted provider-name overmatching, repeat-comment idempotency, cross-tenant default concerns, and partial classifier coverage as secondary follow-ups.

Progress

  • PR Add approval-gated agent failure diagnosis #158 already implements local diagnosis, redaction, fingerprinting, pending draft artifacts, workflow forwarding, and docs.
  • The existing rubrics review already posted the pending-preview validation concern and repeat-comment idempotency note.
  • Inline comments were posted for the missing approval publish path, the generic JavaScript error classifier, and the missing approval-mode no-GraphQL test.
  • No duplicate inline comment was posted for preview validation because it is already clearly covered in the PR discussion.

Issue Details

Approval publish path is missing

Cause: diagnose-agent-failure.ts writes pending-report.md for approval mode, but the PR adds no approved publish route/CLI/workflow that reads the pending report, re-checks requester authorization, and posts it to Discussions.

Candidate solutions: Add an approval publish path, for example a route or workflow dispatch that accepts a fingerprint/artifact reference, validates authorization, and calls the existing Discussion helper. If maintainers intentionally want manual copy-paste only, re-scope the docs and PR closure so it no longer claims the full approval-gated central-report flow.

Comments: This is the main reviewer disagreement. Based on #156 and the linked PR #117 plan, the synthesis treats the approved publish step as required.

Pending report destination is not validated in preview

Cause: buildFailureReport builds the proposed Discussion from raw repository/category strings, while parseRepoSlug is only reached in publishFailureReport.

Candidate solutions: Validate the owner/repo shape and non-empty category while building the proposed report, or surface an explicit warning/status in the diagnosis artifact when the pending report cannot be published.

Product-bug classifier over-promotes generic JavaScript errors

Cause: classifyFailure uses one flat OR block for generic JavaScript error names and .agent/ stack evidence.

Candidate solutions: Require .agent stack-frame evidence for high-confidence agent_product_bug_candidate, or classify generic JavaScript errors without agent frames at lower confidence so explicit true mode does not auto-publish user-code failures.

Approval publish guard lacks regression coverage

Cause: failure-report.test.ts covers true publish/create/comment paths, but not the issue-planned guard that approval mode performs no GraphQL calls.

Candidate solutions: Add a unit test that builds an approval report, calls publishFailureReport with a queued GraphQL client, and asserts status skipped plus zero calls.

Optional hardening remains

Cause: Lower-priority reviewer notes include broad /openai/ and /anthropic/ provider matches, repeat occurrences always adding comments, the default central report repository, and limited classifier category tests.

Candidate solutions: Tighten provider patterns to require error context, dedupe repeat comments by hidden occurrence marker, revisit defaults for external deployments, and add classifier coverage for remaining categories.

Comments: These are useful follow-ups but are secondary to completing the approval flow and preview/test guards.

Recommended Next Step

FIX_PR: the unresolved findings require concrete branch changes and are safe for an automated fix-pr pass guided by the accepted #156 flow.

Final Verdict

NEEDS_REWORK

Action Items

  • Add an approval publish route/CLI/workflow that consumes pending failure reports, re-checks requester authorization, and creates/comments the central Discussion.
  • Validate report repository/category for pending approval output, or surface an explicit unpublishable-preview warning.
  • Tighten product-bug classification so generic JavaScript errors require .agent stack evidence for high-confidence reportability.
  • Add a regression test proving approval mode makes zero GraphQL calls.

@lolipopshock

Copy link
Copy Markdown
Contributor

@sepo-agent /answer ugh ok this turned into a gigantic pr -> can you summarize how this evolves into this current situation and what are the core function components here?

@sepo-agent-app

sepo-agent-app Bot commented May 8, 2026

Copy link
Copy Markdown
Author

This got big because the scope moved from “auto-post failures” to a safer end-to-end reporting system.

The evolution:

  1. PR Report failed agent runs to Discussions #117 had useful pieces, but the UX was too aggressive: public repos could auto-report centrally. In Rethink bug resolution, automatic diagnosis, and automatic reporting #156 the accepted direction changed to: failure -> self-diagnosis -> pending report -> human approval -> central Discussion.
  2. PR Add approval-gated agent failure diagnosis #158 was created from scratch for that new model. The first pass added local diagnosis, redaction, classification, fingerprints, pending artifacts, workflow wiring, and docs.
  3. Review then found the approval loop was incomplete: approval mode made a pending artifact, but nothing consumed it. Fixes added /publish-failure-report, an approval workflow, auth re-checking, destination validation, and safer classifier behavior.
  4. Later review found rerun/idempotency gaps. Fixes added run_attempt to occurrence markers and warnings when diagnosis/artifact persistence fails.
  5. The PR is now large mainly because run-agent-task is shared by nearly every agent workflow, so every route had to forward the new failure_report_* inputs, plus docs/tests had to track the new behavior.

Core functional components:

  • failure-report.ts: core library. Resolves modes, redacts evidence, classifies failures, builds fingerprints, renders the pending Discussion body, validates destination, and handles publish/dedupe.
  • diagnose-agent-failure.ts: failure-time CLI. Reads captured stdout/stderr, builds diagnosis.json and pending-report.md, writes step summary/outputs, and optionally auto-publishes only in explicit true mode.
  • run-agent-task/action.yml: shared hook point. Runs diagnosis after an agent exits nonzero, uploads the artifact, warns if persistence fails, then still preserves the original agent failure.
  • publish-approved-failure-report.ts: approval CLI. Parses /publish-failure-report, rechecks AGENT_ACCESS_POLICY, downloads the artifact, validates schema/mode/fingerprint, then publishes.
  • agent-publish-failure-report.yml: workflow wrapper for manual dispatch or routed approval comments.
  • agent-router.yml: adds /publish-failure-report as a real route.
  • discussion.ts: shared Discussion lookup/create/comment helpers, extended for fingerprint marker lookup and repeat-occurrence dedupe.

Current behavior in one line: failed agent runs are diagnosed locally by default; public repos get a pending sanitized report for approval, private repos stay local unless configured, and central posting only happens after explicit approval or explicit AGENT_FAILURE_REPORT_MODE=true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Handled by the agent effort/high Project management: high effort priority/p1 Project management: high priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rethink bug resolution, automatic diagnosis, and automatic reporting

2 participants