Skip to content

Download GitHub attachments before agent runs#428

Draft
sepo-agent-app[bot] wants to merge 1 commit into
mainfrom
agent/implement-issue-426/codex-27851649879
Draft

Download GitHub attachments before agent runs#428
sepo-agent-app[bot] wants to merge 1 commit into
mainfrom
agent/implement-issue-426/codex-27851649879

Conversation

@sepo-agent-app

Copy link
Copy Markdown

Summary

  • add a shared attachment scanner/downloader and CLI that writes a manifest under $RUNNER_TEMP/agent-attachments
  • wire run-agent-task to run it before the agent and inject the manifest into the base prompt
  • document attachment handling and add runtime/workflow coverage

Tests

  • npm run build
  • node --test dist/__tests__/attachments.test.js dist/__tests__/envelope.test.js dist/__tests__/workflow-action-validation.test.js dist/__tests__/docs-validation.test.js
  • git diff --check

Closes #426

@sepo-agent-app sepo-agent-app Bot added the agent Handled by the agent label Jun 19, 2026
@sepo-agent-app

This comment has been minimized.

@sepo-agent-app

Copy link
Copy Markdown
Author

Rubrics Review

Total Score Verdict Rubrics Scored
100 PASS 9
Dimension Rubric Result Score Evidence
coding_workflow/generic Keep docs in sync pass 7/7 Docs now describe attachment handling in supported workflows and run-agent-task: supported-workflows.md#L210-L212, internal-actions.md#L14.
coding_workflow/generic Reuse existing code pass 8/8 The implementation reuses the shared run-agent-task action, existing gh wrapper, discussion transcript helper, GraphQL client, and setOutput helper rather than adding per-route workflow glue: attachments.ts#L1-L10, download-attachments.ts#L8-L14.
coding_workflow/generic Prefer explicit structured inputs pass 7/7 Attachment data flows through explicit env fields and a JSON manifest path, with prompt injection gated by the runtime allowlist: action.yml#L342-L376, run.ts#L71-L80, run.ts#L414-L420.
coding_workflow/generic Surface stateful failures pass 7/7 Collection and download failures are preserved in the manifest instead of being hidden, including missing-token, HTTP, and thrown-fetch paths: attachments.ts#L411-L416, attachments.ts#L517-L579, attachments.test.ts#L187-L251.
coding_workflow/generic Minimal defaults, native escape hatches pass 8/8 The feature is a default shared runtime capability under run-agent-task, with no new public settings or route-specific knobs; advanced behavior remains customizable in repo-owned workflow/runtime code: action.yml#L342-L354.
coding_workflow/generic Name config by capability pass 6/6 New env/output names describe the general attachment capability, not a narrow first use case: ATTACHMENTS_DIR, ATTACHMENTS_MANIFEST_FILE, and manifest_file: download-attachments.ts#L21-L41, download-attachments.ts#L67-L70.
coding/coding_style Prefer simple implementations pass 7/7 The implementation uses one typed module, one CLI, one shared action step, and base-prompt guidance; it avoids extra routes, settings, or duplicated per-workflow scanners.
coding/coding Make surgical changes pass 8/8 The diff is limited to the attachment module/CLI, shared runtime wiring, prompt contract, focused docs, and targeted tests; no unrelated workflow or route refactors were introduced.
coding_workflow/generic Read docs and linked context first pass 6/6 The PR closes issue #426 and implements the exact linked design: scan issue/PR/discussion/request text, download with the resolved token, write a manifest, and instruct agents to use local paths. The current diff matches that scope.

Notes

  • The strongest score drivers are the shared run-agent-task integration, explicit manifest contract, focused docs update, and regression coverage for extraction, dedupe, safe filenames, authenticated downloads, per-file errors, and workflow/prompt wiring.
  • I verified the PR head locally with npm ci, npm run build, node --test dist/__tests__/attachments.test.js dist/__tests__/envelope.test.js dist/__tests__/workflow-action-validation.test.js dist/__tests__/docs-validation.test.js (91 passing subtests), and git diff --check.
  • Not scored: selected authorization, orchestration, rubric-lifecycle, merge, preview, and status-comment rubrics that do not apply to this attachment-handling implementation.
  • No rubric-specific follow-up needed.

Findings

  • INFO: No blocking or warning findings against the applicable selected rubrics.

Final Rubric Verdict

PASS


codex | gpt-5.5/xhigh | zucchini-m4-3

--limit "${RUBRICS_LIMIT}" \
--output-file "${RUBRICS_CONTEXT_FILE}"

- name: Download GitHub attachments

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step is prompt enrichment, but it lacks continue-on-error: true while adjacent best-effort enrichment steps use it. If the CLI rejects before writing outputs (mkdirSync, writeFileSync, or setOutput), the composite action fails and the agent never runs on any route. Please make the step non-fatal and let run.ts fall back when the manifest is missing.

Comment thread .agent/src/run.ts
if (promptVars.RUBRICS_CONTEXT_FILE && existsSync(promptVars.RUBRICS_CONTEXT_FILE)) {
promptVars.RUBRICS_CONTEXT = readFileSync(promptVars.RUBRICS_CONTEXT_FILE, "utf8");
}
if (promptVars.ATTACHMENTS_MANIFEST_FILE && existsSync(promptVars.ATTACHMENTS_MANIFEST_FILE)) {

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This manifest is only injected into the full/base prompt, but resumed sessions pass buildContinuationPrompt(...) to runAcpx; that continuation prompt does not include ATTACHMENTS_MANIFEST or the localPath instructions. A follow-up /answer or /fix-pr with a private attachment can download the file but still leave the agent with only the original github.com/user-attachments URL. Please include the manifest in continuation prompts, or force full prompt replay when a manifest is present, and add coverage.

Comment thread .agent/src/attachments.ts
continue;
}

const bytes = Buffer.from(await response.arrayBuffer());

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

await response.arrayBuffer() buffers the whole attachment with no size limit, and this loop has no count cap or fetch timeout. Since URLs are collected from issue/PR/discussion text, a large file, many links, or a slow response can exhaust runner resources or stall this shared pre-agent step. Please add size/count/time limits before downloading arbitrary attachments.

@sepo-agent-app

Copy link
Copy Markdown
Author

AI Review Synthesis

Dual-agent review by Claude and Codex.
Requested by @lolipopshock.

Summary of PR/Issue

PR #428 implements issue #426 by downloading GitHub user-attachments before the agent runs, using the resolved GitHub token, and passing a local manifest into the prompt. The goal is to make private-repo attachments readable without exposing tokens to the model or relying on unauthenticated fetches. The implementation adds a shared scanner/downloader, wires it into run-agent-task, updates the base prompt, and adds focused docs/tests. The current design is well placed on the shared runtime path, but the current diff still has one correctness gap for resumed sessions and two robustness issues on the shared pre-agent path.

Review

Overall, the PR is directionally solid and matches the linked issue plan, but it should not ship until resumed runs receive the attachment manifest. Reviewers disagreed on severity: the rubrics review passed with no warnings, while the code reviews identified concrete runtime issues; the resumed-session gap is supported by the current PR head.

Issue Severity Description
Resumed sessions lose attachment manifests BLOCKING The manifest is injected only into the full/base prompt, but resumed persistent sessions use the lightweight continuation prompt, which does not include ATTACHMENTS_MANIFEST or local-path instructions.
Attachment enrichment can fail all agent runs WARNING The new shared attachment step lacks continue-on-error: true, so top-level CLI failures can stop the composite action before the agent runs.
Attachment downloads lack resource bounds WARNING The downloader buffers each response with arrayBuffer() and has no file-size cap, attachment-count cap, or fetch timeout for URLs collected from issue/PR/discussion text.

Progress

  • PR already implements the shared scanner/downloader, manifest injection, prompt guidance, docs, and targeted runtime/workflow tests.
  • Rubrics review reported PASS and no rubric-specific follow-up.
  • Existing inline comments and review threads were empty before synthesis.
  • Posted three inline comments for the concrete changed-line issues: .agent/src/run.ts#L414, .github/actions/run-agent-task/action.yml#L342, and .agent/src/attachments.ts#L558.

Issue Details

Resumed sessions lose attachment manifests

Cause: run.ts reads ATTACHMENTS_MANIFEST into promptVars, then renders the full prompt, but resumed sessions can pass buildContinuationPrompt(...) to runAcpx. buildContinuationPrompt only includes trigger metadata, optional answer-review context, and REQUEST_TEXT, so follow-up /answer or /fix-pr runs can download an attachment without telling the agent the localPath.

Candidate solutions: Include the attachment manifest section in continuation prompts, or force full prompt replay when a manifest exists. Add prompt-continuation.test.ts coverage for an attachment-bearing resumed run.

Comments: This is the main reviewer disagreement: rubrics passed, but the current runtime code confirms the resumed prompt path drops the new manifest context.

Attachment enrichment can fail all agent runs

Cause: The Download GitHub attachments step is on the shared run-agent-task path and lacks continue-on-error: true. Per-file download errors are captured in the manifest, but top-level failures such as output directory creation, manifest writing, or setOutput can still reject and fail the action.

Candidate solutions: Make the step non-fatal with continue-on-error: true, matching nearby best-effort enrichment steps, and rely on the existing run.ts missing-manifest fallback.

Attachment downloads lack resource bounds

Cause: downloadGitHubAttachments fetches every collected URL sequentially and buffers the full body with await response.arrayBuffer() before writing to disk. There is no cap on response size, number of references, or request duration.

Candidate solutions: Add a maximum attachment count, a per-file size cap using Content-Length plus a guarded read, and a fetch timeout such as AbortSignal.timeout(...). Record over-limit and timeout failures as manifest errors instead of failing the run.

Recommended Next Step

FIX_PR: The unresolved issues are concrete branch changes suitable for an automated fix-pr pass.

Final Verdict

NEEDS_REWORK

Action Items

  • Ensure resumed-session continuation prompts include the attachment manifest/local-path guidance, or replay the full prompt when a manifest exists, with regression coverage.
  • Make the shared attachment download step non-fatal so best-effort enrichment cannot block every agent run.
  • Add attachment download size, count, and timeout bounds while preserving manifest-level error reporting.

codex | gpt-5.5/xhigh | Macmini-runner-6

@sepo-agent-app

sepo-agent-app Bot commented Jun 19, 2026

Copy link
Copy Markdown
Author

Sepo is dispatching follow-up automation.

Source Next Target Round Status
review fix-pr PR #428 4 / 40 Dispatched

Reason: agent planner selected fix-pr: Latest review synthesis for open PR #428 is NEEDS_REWORK with Recommended Next Step FIX_PR; the unresolved items are concrete branch changes and round budget remains available.

Task for fix-pr:
Address only the latest review synthesis action items:

  • Ensure resumed-session continuation prompts include the attachment manifest/local-path guidance, or replay the full prompt when a manifest exists, with regression coverage.
  • Make the shared attachment download step non-fatal so best-effort enrichment cannot block every agent run.
  • Add attachment download size, count, and timeout bounds while preserving manifest-level error reporting.

Constraints: Ignore optional INFO notes, metadata-only polish, already-fixed findings, and human-judgment nits unless required by those action items.

@sepo-agent-app

Copy link
Copy Markdown
Author

Sepo could not complete the PR fix run.

Inspect the workflow logs and retry if appropriate.

@sepo-agent-app

Copy link
Copy Markdown
Author

Sepo orchestration needs clarification before it can continue.

Sepo cannot continue the automatic PR-fix chain safely. The fix-pr run failed before the agent started because the PR branch's shared action calls .agent/dist/cli/download-attachments.js, but that built file was not present in the runtime checkout. The review findings on PR #428 remain unresolved; a human should inspect the workflow logs for run 27852387063 and either repair the runtime/build artifact issue or rerun fix-pr after the branch can execute run-agent-task successfully.

  • Source action: fix-pr
  • Source conclusion: failed
  • Target: pull_request #428
  • Round: 4/40
  • Reason: agent planner blocked: The latest fix-pr run failed before the agent could run, due to a deterministic runtime error: PR Download GitHub attachments before agent runs #428's shared action invokes .agent/dist/cli/download-attachments.js, but that built CLI is absent in the checked-out runtime. Policy forbids handing a failed fix-pr directly to review or another automatic fix-pr pass.
  • Source run ID: 27852387063

No follow-up workflow was dispatched. Reply with the requested context, then continue with /orchestrate, /implement, or /answer when ready.

@lolipopshock

Copy link
Copy Markdown
Contributor

Ok the first q that i want to ask is, does it only download the file in the latest comment or all past comments?

I think this causes a broader issue: what if agents keep searching for all issues for files? We cannot exhaust everything right?

But the way seems to provide the environment key to download the file? Or? We are a customized and protected file download command? What to do?

@sepo-agent-app

Copy link
Copy Markdown
Author

It currently scans all past comments on the current target, not only the latest comment.

For PR #428 specifically, collectAttachmentTextSources scans:

  • forwarded REQUEST_TEXT
  • issue body + issue comments
  • PR body + issue comments + PR reviews + PR review comments
  • discussion body + discussion comments/replies

See the current PR code around attachments.ts#L317-L395. It dedupes attachment URLs by exact URL before downloading: attachments.ts#L161-L188.

It does not search all issues in the repo. It only queries the current issue/PR/discussion target number passed into run-agent-task. So the broader “agents searching every issue for files” concern is real in principle, but this PR does not implement that behavior.

On auth: yes, this is meant to be a customized protected pre-agent download path. The action passes the resolved GitHub token only to the downloader step: run-agent-task/action.yml#L342-L354. The downloader uses it as an Authorization: Bearer ... header: attachments.ts#L529-L536. The prompt only receives the manifest/local paths, not the token: _base.md#L12-L19.

What I’d do before merging:

  1. Keep downloader scope bounded to the current target thread only.
  2. Add hard limits: max attachment count, max bytes per file, and fetch timeout.
  3. Make the downloader best-effort so failures produce manifest errors instead of blocking the whole agent run.
  4. Optionally narrow policy further: for issue_comment triggers, download only REQUEST_TEXT plus target body by default, and scan older comments only if we decide that cross-comment attachments are important enough.

My recommendation: keep current-target scanning, but add the resource limits. Do not add repo-wide issue search for attachments.


codex | gpt-5.5/xhigh | zucchini-m4-1

@lolipopshock

Copy link
Copy Markdown
Contributor

@sepo-agent /answer

On auth: yes, this is meant to be a customized protected pre-agent download path. The action passes the resolved GitHub token only to the downloader step: [run-agent-task/action.yml#L342-L354](https://github.com/self-evolving/repo/blob/487b2b404cbb9c881252ec7f3155b91d6bba4a88/.github/actions/run-agent-task/action.yml#L342-L354). The downloader uses it as an Authorization: Bearer ... header: [attachments.ts#L529-L536](https://github.com/self-evolving/repo/blob/487b2b404cbb9c881252ec7f3155b91d6bba4a88/.agent/src/attachments.ts#L529-L536). The prompt only receives the manifest/local paths, not the token: [_base.md#L12-L19](https://github.com/self-evolving/repo/blob/487b2b404cbb9c881252ec7f3155b91d6bba4a88/.github/prompts/_base.md#L12-L19).

Given this, can we go deeper to this idea if "building a customized and protected file download command and share that in the prompt with the agent"?

@sepo-agent-app

Copy link
Copy Markdown
Author

Yes, we can go deeper, but I’d separate two meanings of “protected command”:

  1. Current PR model: protected pre-download

    • Before the agent starts, run-agent-task runs a downloader with the GitHub token.
    • The agent only receives a manifest of local files.
    • This is the cleanest version because the model never needs to decide how to authenticate attachment downloads.
    • Current PR shape: download-attachments.ts, prompt manifest injection in _base.md.
  2. Agent-callable protected download command

    • We could expose a command in the prompt, but it should not be download any URL.
    • It should be something like:
      node .agent/dist/cli/attachment.js get --id attachment-003
    • The command should only download URLs already discovered from the current target thread and written into a manifest/allowlist.
    • It should reject arbitrary URLs, cross-repo URLs, excessive size, too many files, and timeouts.

The important caveat: a CLI alone is not a hard security boundary if the agent process also has the raw GitHub token in its environment. In the current runtime, run-agent-task passes GitHub auth into the agent step for normal repo operations. So a “protected command” is useful for policy, auditability, limits, and better behavior, but it does not fully prevent a sufficiently capable shell agent from using credentials elsewhere.

So my recommendation is:

  • For this PR: keep the pre-download + manifest design.
  • Make it bounded: current target only, max count, max size, timeout, best-effort failure.
  • Do not expose a general URL downloader.
  • If we later want a true protected interactive downloader, design it as a broker:
    • token is not placed in the agent environment,
    • agent can only request files by manifest ID,
    • broker enforces target/repo allowlist and limits,
    • all downloads are logged into the manifest.

So the short answer is: yes, “customized protected file download command” is a good direction, but for now I’d treat this PR’s pre-agent downloader as the safer first version. A command shared in the prompt should only operate on pre-approved manifest entries, not arbitrary attachment URLs.


codex | gpt-5.5 | high | Macmini-runner-3

@lolipopshock

Copy link
Copy Markdown
Contributor
Agent-callable protected download command

We could expose a command in the prompt, but it should not be download any URL.
It should be something like:
node .agent/dist/cli/attachment.js get --id attachment-003
The command should only download URLs already discovered from the current target thread and written into a manifest/allowlist.
It should reject arbitrary URLs, cross-repo URLs, excessive size, too many files, and timeouts.

@sepo-agent /answer can we implement this version directly without the preloaded (we should explore this as a completely direction -- branching out from main and workin on that).

@sepo-agent-app

Copy link
Copy Markdown
Author

Yes, but “directly without preload” should mean no pre-downloading file bytes, not “no pre-run setup at all”.

We still need a pre-agent index/allowlist step, because the protected command needs to know which attachment IDs are allowed. The better branch direction from main would be:

  1. Add an attachment indexer:

    • scans only the current target thread,
    • records discovered github.com/user-attachments/... URLs,
    • writes a manifest like:
      {
        "attachments": [
          {
            "id": "attachment-001",
            "url": "...",
            "sources": [...],
            "status": "available"
          }
        ]
      }
    • does not download file contents yet.
  2. Add an agent-callable command:

    node .agent/dist/cli/attachment.js get --id attachment-001

    This command should:

    • accept only manifest IDs, not raw URLs,
    • reject IDs not in the current-run manifest,
    • enforce max count, max size, timeout, and current repo/target constraints,
    • download to $RUNNER_TEMP/agent-attachments/,
    • update the manifest with localPath, size, content type, and errors.
  3. Update the prompt to say:

    • use attachment.js list to inspect available attachments,
    • use attachment.js get --id ... to fetch one,
    • do not fetch original github.com/user-attachments URLs directly.

The hard part: today run-agent-task passes INPUT_GITHUB_TOKEN into the agent step, and buildSharedEnv() mirrors it to GH_TOKEN/GITHUB_TOKEN: runtime-env.ts#L29-L35. So a CLI wrapper is not a true security boundary while the agent also has raw GitHub auth. It is still useful for policy and auditability, but not fully “protected”.

So I’d split implementation into two phases:

  • Phase 1: implement on-demand download by manifest ID, with limits and logging. This replaces preloaded file bytes.
  • Phase 2: make it truly protected by changing the auth boundary: do not pass raw GitHub token into the agent, or introduce a local broker process that holds the token and only serves allowlisted attachment IDs.

My recommendation: create a new branch from main for this direction, not continue PR #428 as-is. PR #428 is the “pre-download manifest” design; this would be a cleaner alternate design: pre-index only, download on demand by allowlisted ID.


codex | gpt-5.5 | high | Macmini-runner-1

@lolipopshock

lolipopshock commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Add an attachment indexer:

  • scans only the current target thread,
  • records discovered github.com/user-attachments/... URLs,
  • writes a manifest like:
    {
      "attachments": [
        {
          "id": "attachment-001",
          "url": "...",
          "sources": [...],
          "status": "available"
        }
      ]
    }
  • does not download file contents yet.

No I don't think we should do this. We just simplify this:

  • build a command, takes in any given a github url, and then it will parse it and download it.

@sepo-agent /answer

@sepo-agent-app

Copy link
Copy Markdown
Author

Yes, we can implement this as a new direction from main: no preloaded files, no indexer, just an agent-callable download command.

I would still narrow “any given GitHub URL” to any GitHub user-attachment URL:

node .agent/dist/cli/download-github-attachment.js \
  --url "https://github.com/user-attachments/files/..."

The command should:

  • accept a raw URL from the agent,
  • validate https://github.com/user-attachments/(files|assets)/...,
  • reject non-GitHub and non-attachment URLs,
  • download using INPUT_GITHUB_TOKEN / GH_TOKEN,
  • enforce max file size, timeout, and safe filename handling,
  • write to $RUNNER_TEMP/agent-attachments/,
  • print the local path and metadata.

I would not make it a general authenticated github.com downloader. That becomes too close to “authenticated curl for anything GitHub can access.” Attachment URLs are the real problem, so the command should be attachment-specific.

Current architecture fit:

So this version is simpler than PR #428:

  1. Add .agent/src/cli/download-github-attachment.ts.
  2. Add a small shared helper for URL validation, filename sanitization, fetch timeout, and size cap.
  3. Update _base.md to tell agents: use this command for github.com/user-attachments links; do not use raw curl/browser fetches for private attachments.
  4. Add tests for valid attachment URLs, rejected URLs, missing token, timeout, oversized response, filename handling, and successful download.
  5. Update docs for the command.

I’d close or supersede PR #428 and branch from main for this alternate design. It gives us the behavior you want without scanning all past comments and without pre-downloading anything.


codex | gpt-5.5 | high | Macmini-runner-3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Handled by the agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle comment files in private repos

2 participants