Download GitHub attachments before agent runs by sepo-agent-app[bot] · Pull Request #428 · self-evolving/repo

sepo-agent-app · 2026-06-19T22:52:35Z

Summary

add a shared attachment scanner/downloader and CLI that writes a manifest under $RUNNER_TEMP/agent-attachments
wire run-agent-task to run it before the agent and inject the manifest into the base prompt
document attachment handling and add runtime/workflow coverage

Tests

npm run build
node --test dist/__tests__/attachments.test.js dist/__tests__/envelope.test.js dist/__tests__/workflow-action-validation.test.js dist/__tests__/docs-validation.test.js
git diff --check

Closes #426

sepo-agent-app · 2026-06-19T22:56:44Z

Rubrics Review

Total Score	Verdict	Rubrics Scored
100	PASS	9

Dimension	Rubric	Result	Score	Evidence
coding_workflow/generic	Keep docs in sync	pass	7/7	Docs now describe attachment handling in supported workflows and `run-agent-task`: supported-workflows.md#L210-L212, internal-actions.md#L14.
coding_workflow/generic	Reuse existing code	pass	8/8	The implementation reuses the shared `run-agent-task` action, existing `gh` wrapper, discussion transcript helper, GraphQL client, and `setOutput` helper rather than adding per-route workflow glue: attachments.ts#L1-L10, download-attachments.ts#L8-L14.
coding_workflow/generic	Prefer explicit structured inputs	pass	7/7	Attachment data flows through explicit env fields and a JSON manifest path, with prompt injection gated by the runtime allowlist: action.yml#L342-L376, run.ts#L71-L80, run.ts#L414-L420.
coding_workflow/generic	Surface stateful failures	pass	7/7	Collection and download failures are preserved in the manifest instead of being hidden, including missing-token, HTTP, and thrown-fetch paths: attachments.ts#L411-L416, attachments.ts#L517-L579, attachments.test.ts#L187-L251.
coding_workflow/generic	Minimal defaults, native escape hatches	pass	8/8	The feature is a default shared runtime capability under `run-agent-task`, with no new public settings or route-specific knobs; advanced behavior remains customizable in repo-owned workflow/runtime code: action.yml#L342-L354.
coding_workflow/generic	Name config by capability	pass	6/6	New env/output names describe the general attachment capability, not a narrow first use case: `ATTACHMENTS_DIR`, `ATTACHMENTS_MANIFEST_FILE`, and `manifest_file`: download-attachments.ts#L21-L41, download-attachments.ts#L67-L70.
coding/coding_style	Prefer simple implementations	pass	7/7	The implementation uses one typed module, one CLI, one shared action step, and base-prompt guidance; it avoids extra routes, settings, or duplicated per-workflow scanners.
coding/coding	Make surgical changes	pass	8/8	The diff is limited to the attachment module/CLI, shared runtime wiring, prompt contract, focused docs, and targeted tests; no unrelated workflow or route refactors were introduced.
coding_workflow/generic	Read docs and linked context first	pass	6/6	The PR closes issue #426 and implements the exact linked design: scan issue/PR/discussion/request text, download with the resolved token, write a manifest, and instruct agents to use local paths. The current diff matches that scope.

Notes

The strongest score drivers are the shared run-agent-task integration, explicit manifest contract, focused docs update, and regression coverage for extraction, dedupe, safe filenames, authenticated downloads, per-file errors, and workflow/prompt wiring.
I verified the PR head locally with npm ci, npm run build, node --test dist/__tests__/attachments.test.js dist/__tests__/envelope.test.js dist/__tests__/workflow-action-validation.test.js dist/__tests__/docs-validation.test.js (91 passing subtests), and git diff --check.
Not scored: selected authorization, orchestration, rubric-lifecycle, merge, preview, and status-comment rubrics that do not apply to this attachment-handling implementation.
No rubric-specific follow-up needed.

Findings

INFO: No blocking or warning findings against the applicable selected rubrics.

Final Rubric Verdict

PASS

codex | gpt-5.5/xhigh | zucchini-m4-3

sepo-agent-app · 2026-06-19T23:04:24Z

          --limit "${RUBRICS_LIMIT}" \
          --output-file "${RUBRICS_CONTEXT_FILE}"

+    - name: Download GitHub attachments


This step is prompt enrichment, but it lacks continue-on-error: true while adjacent best-effort enrichment steps use it. If the CLI rejects before writing outputs (mkdirSync, writeFileSync, or setOutput), the composite action fails and the agent never runs on any route. Please make the step non-fatal and let run.ts fall back when the manifest is missing.

sepo-agent-app · 2026-06-19T23:04:24Z

  if (promptVars.RUBRICS_CONTEXT_FILE && existsSync(promptVars.RUBRICS_CONTEXT_FILE)) {
    promptVars.RUBRICS_CONTEXT = readFileSync(promptVars.RUBRICS_CONTEXT_FILE, "utf8");
  }
+  if (promptVars.ATTACHMENTS_MANIFEST_FILE && existsSync(promptVars.ATTACHMENTS_MANIFEST_FILE)) {


This manifest is only injected into the full/base prompt, but resumed sessions pass buildContinuationPrompt(...) to runAcpx; that continuation prompt does not include ATTACHMENTS_MANIFEST or the localPath instructions. A follow-up /answer or /fix-pr with a private attachment can download the file but still leave the agent with only the original github.com/user-attachments URL. Please include the manifest in continuation prompts, or force full prompt replay when a manifest is present, and add coverage.

sepo-agent-app · 2026-06-19T23:04:24Z

+        continue;
+      }
+
+      const bytes = Buffer.from(await response.arrayBuffer());


await response.arrayBuffer() buffers the whole attachment with no size limit, and this loop has no count cap or fetch timeout. Since URLs are collected from issue/PR/discussion text, a large file, many links, or a slow response can exhaust runner resources or stall this shared pre-agent step. Please add size/count/time limits before downloading arbitrary attachments.

sepo-agent-app · 2026-06-19T23:05:05Z

AI Review Synthesis

Dual-agent review by Claude and Codex.
Requested by @lolipopshock.

Summary of PR/Issue

PR #428 implements issue #426 by downloading GitHub user-attachments before the agent runs, using the resolved GitHub token, and passing a local manifest into the prompt. The goal is to make private-repo attachments readable without exposing tokens to the model or relying on unauthenticated fetches. The implementation adds a shared scanner/downloader, wires it into run-agent-task, updates the base prompt, and adds focused docs/tests. The current design is well placed on the shared runtime path, but the current diff still has one correctness gap for resumed sessions and two robustness issues on the shared pre-agent path.

Review

Overall, the PR is directionally solid and matches the linked issue plan, but it should not ship until resumed runs receive the attachment manifest. Reviewers disagreed on severity: the rubrics review passed with no warnings, while the code reviews identified concrete runtime issues; the resumed-session gap is supported by the current PR head.

Issue	Severity	Description
Resumed sessions lose attachment manifests	BLOCKING	The manifest is injected only into the full/base prompt, but resumed persistent sessions use the lightweight continuation prompt, which does not include `ATTACHMENTS_MANIFEST` or local-path instructions.
Attachment enrichment can fail all agent runs	WARNING	The new shared attachment step lacks `continue-on-error: true`, so top-level CLI failures can stop the composite action before the agent runs.
Attachment downloads lack resource bounds	WARNING	The downloader buffers each response with `arrayBuffer()` and has no file-size cap, attachment-count cap, or fetch timeout for URLs collected from issue/PR/discussion text.

Progress

PR already implements the shared scanner/downloader, manifest injection, prompt guidance, docs, and targeted runtime/workflow tests.
Rubrics review reported PASS and no rubric-specific follow-up.
Existing inline comments and review threads were empty before synthesis.
Posted three inline comments for the concrete changed-line issues: .agent/src/run.ts#L414, .github/actions/run-agent-task/action.yml#L342, and .agent/src/attachments.ts#L558.

Issue Details

Resumed sessions lose attachment manifests

Cause: run.ts reads ATTACHMENTS_MANIFEST into promptVars, then renders the full prompt, but resumed sessions can pass buildContinuationPrompt(...) to runAcpx. buildContinuationPrompt only includes trigger metadata, optional answer-review context, and REQUEST_TEXT, so follow-up /answer or /fix-pr runs can download an attachment without telling the agent the localPath.

Candidate solutions: Include the attachment manifest section in continuation prompts, or force full prompt replay when a manifest exists. Add prompt-continuation.test.ts coverage for an attachment-bearing resumed run.

Comments: This is the main reviewer disagreement: rubrics passed, but the current runtime code confirms the resumed prompt path drops the new manifest context.

Attachment enrichment can fail all agent runs

Cause: The Download GitHub attachments step is on the shared run-agent-task path and lacks continue-on-error: true. Per-file download errors are captured in the manifest, but top-level failures such as output directory creation, manifest writing, or setOutput can still reject and fail the action.

Candidate solutions: Make the step non-fatal with continue-on-error: true, matching nearby best-effort enrichment steps, and rely on the existing run.ts missing-manifest fallback.

Attachment downloads lack resource bounds

Cause: downloadGitHubAttachments fetches every collected URL sequentially and buffers the full body with await response.arrayBuffer() before writing to disk. There is no cap on response size, number of references, or request duration.

Candidate solutions: Add a maximum attachment count, a per-file size cap using Content-Length plus a guarded read, and a fetch timeout such as AbortSignal.timeout(...). Record over-limit and timeout failures as manifest errors instead of failing the run.

Recommended Next Step

FIX_PR: The unresolved issues are concrete branch changes suitable for an automated fix-pr pass.

Final Verdict

NEEDS_REWORK

Action Items

Ensure resumed-session continuation prompts include the attachment manifest/local-path guidance, or replay the full prompt when a manifest exists, with regression coverage.
Make the shared attachment download step non-fatal so best-effort enrichment cannot block every agent run.
Add attachment download size, count, and timeout bounds while preserving manifest-level error reporting.

codex | gpt-5.5/xhigh | Macmini-runner-6

sepo-agent-app · 2026-06-19T23:05:55Z

Sepo is dispatching follow-up automation.

Source	Next	Target	Round	Status
review	fix-pr	PR #428	4 / 40	Dispatched

Reason: agent planner selected fix-pr: Latest review synthesis for open PR #428 is NEEDS_REWORK with Recommended Next Step FIX_PR; the unresolved items are concrete branch changes and round budget remains available.

Task for fix-pr:
Address only the latest review synthesis action items:

Ensure resumed-session continuation prompts include the attachment manifest/local-path guidance, or replay the full prompt when a manifest exists, with regression coverage.
Make the shared attachment download step non-fatal so best-effort enrichment cannot block every agent run.
Add attachment download size, count, and timeout bounds while preserving manifest-level error reporting.

Constraints: Ignore optional INFO notes, metadata-only polish, already-fixed findings, and human-judgment nits unless required by those action items.

sepo-agent-app · 2026-06-19T23:06:11Z

Sepo could not complete the PR fix run.

Inspect the workflow logs and retry if appropriate.

sepo-agent-app · 2026-06-19T23:07:27Z

Sepo orchestration needs clarification before it can continue.

Sepo cannot continue the automatic PR-fix chain safely. The fix-pr run failed before the agent started because the PR branch's shared action calls .agent/dist/cli/download-attachments.js, but that built file was not present in the runtime checkout. The review findings on PR #428 remain unresolved; a human should inspect the workflow logs for run 27852387063 and either repair the runtime/build artifact issue or rerun fix-pr after the branch can execute run-agent-task successfully.

Source action: fix-pr
Source conclusion: failed
Target: pull_request #428
Round: 4/40
Reason: agent planner blocked: The latest fix-pr run failed before the agent could run, due to a deterministic runtime error: PR Download GitHub attachments before agent runs #428's shared action invokes .agent/dist/cli/download-attachments.js, but that built CLI is absent in the checked-out runtime. Policy forbids handing a failed fix-pr directly to review or another automatic fix-pr pass.
Source run ID: 27852387063

No follow-up workflow was dispatched. Reply with the requested context, then continue with /orchestrate, /implement, or /answer when ready.

lolipopshock · 2026-06-20T05:11:31Z

Ok the first q that i want to ask is, does it only download the file in the latest comment or all past comments?

I think this causes a broader issue: what if agents keep searching for all issues for files? We cannot exhaust everything right?

But the way seems to provide the environment key to download the file? Or? We are a customized and protected file download command? What to do?

sepo-agent-app · 2026-06-20T05:13:44Z

It currently scans all past comments on the current target, not only the latest comment.

For PR #428 specifically, collectAttachmentTextSources scans:

forwarded REQUEST_TEXT
issue body + issue comments
PR body + issue comments + PR reviews + PR review comments
discussion body + discussion comments/replies

See the current PR code around attachments.ts#L317-L395. It dedupes attachment URLs by exact URL before downloading: attachments.ts#L161-L188.

It does not search all issues in the repo. It only queries the current issue/PR/discussion target number passed into run-agent-task. So the broader “agents searching every issue for files” concern is real in principle, but this PR does not implement that behavior.

On auth: yes, this is meant to be a customized protected pre-agent download path. The action passes the resolved GitHub token only to the downloader step: run-agent-task/action.yml#L342-L354. The downloader uses it as an Authorization: Bearer ... header: attachments.ts#L529-L536. The prompt only receives the manifest/local paths, not the token: _base.md#L12-L19.

What I’d do before merging:

Keep downloader scope bounded to the current target thread only.
Add hard limits: max attachment count, max bytes per file, and fetch timeout.
Make the downloader best-effort so failures produce manifest errors instead of blocking the whole agent run.
Optionally narrow policy further: for issue_comment triggers, download only REQUEST_TEXT plus target body by default, and scan older comments only if we decide that cross-comment attachments are important enough.

My recommendation: keep current-target scanning, but add the resource limits. Do not add repo-wide issue search for attachments.

codex | gpt-5.5/xhigh | zucchini-m4-1

lolipopshock · 2026-06-25T16:44:52Z

@sepo-agent /answer

On auth: yes, this is meant to be a customized protected pre-agent download path. The action passes the resolved GitHub token only to the downloader step: [run-agent-task/action.yml#L342-L354](https://github.com/self-evolving/repo/blob/487b2b404cbb9c881252ec7f3155b91d6bba4a88/.github/actions/run-agent-task/action.yml#L342-L354). The downloader uses it as an Authorization: Bearer ... header: [attachments.ts#L529-L536](https://github.com/self-evolving/repo/blob/487b2b404cbb9c881252ec7f3155b91d6bba4a88/.agent/src/attachments.ts#L529-L536). The prompt only receives the manifest/local paths, not the token: [_base.md#L12-L19](https://github.com/self-evolving/repo/blob/487b2b404cbb9c881252ec7f3155b91d6bba4a88/.github/prompts/_base.md#L12-L19).

Given this, can we go deeper to this idea if "building a customized and protected file download command and share that in the prompt with the agent"?

sepo-agent-app · 2026-06-25T16:47:16Z

Yes, we can go deeper, but I’d separate two meanings of “protected command”:

Current PR model: protected pre-download
- Before the agent starts, run-agent-task runs a downloader with the GitHub token.
- The agent only receives a manifest of local files.
- This is the cleanest version because the model never needs to decide how to authenticate attachment downloads.
- Current PR shape: download-attachments.ts, prompt manifest injection in _base.md.
Agent-callable protected download command
- We could expose a command in the prompt, but it should not be download any URL.
- It should be something like:
```
node .agent/dist/cli/attachment.js get --id attachment-003
```
- The command should only download URLs already discovered from the current target thread and written into a manifest/allowlist.
- It should reject arbitrary URLs, cross-repo URLs, excessive size, too many files, and timeouts.

The important caveat: a CLI alone is not a hard security boundary if the agent process also has the raw GitHub token in its environment. In the current runtime, run-agent-task passes GitHub auth into the agent step for normal repo operations. So a “protected command” is useful for policy, auditability, limits, and better behavior, but it does not fully prevent a sufficiently capable shell agent from using credentials elsewhere.

So my recommendation is:

For this PR: keep the pre-download + manifest design.
Make it bounded: current target only, max count, max size, timeout, best-effort failure.
Do not expose a general URL downloader.
If we later want a true protected interactive downloader, design it as a broker:
- token is not placed in the agent environment,
- agent can only request files by manifest ID,
- broker enforces target/repo allowlist and limits,
- all downloads are logged into the manifest.

So the short answer is: yes, “customized protected file download command” is a good direction, but for now I’d treat this PR’s pre-agent downloader as the safer first version. A command shared in the prompt should only operate on pre-approved manifest entries, not arbitrary attachment URLs.

codex | gpt-5.5 | high | Macmini-runner-3

lolipopshock · 2026-06-25T17:11:22Z

Agent-callable protected download command

We could expose a command in the prompt, but it should not be download any URL.
It should be something like:
node .agent/dist/cli/attachment.js get --id attachment-003
The command should only download URLs already discovered from the current target thread and written into a manifest/allowlist.
It should reject arbitrary URLs, cross-repo URLs, excessive size, too many files, and timeouts.

@sepo-agent /answer can we implement this version directly without the preloaded (we should explore this as a completely direction -- branching out from main and workin on that).

sepo-agent-app · 2026-06-25T17:13:13Z

Yes, but “directly without preload” should mean no pre-downloading file bytes, not “no pre-run setup at all”.

We still need a pre-agent index/allowlist step, because the protected command needs to know which attachment IDs are allowed. The better branch direction from main would be:

Add an attachment indexer:

scans only the current target thread,
records discovered github.com/user-attachments/... URLs,

writes a manifest like:

{
  "attachments": [
    {
      "id": "attachment-001",
      "url": "...",
      "sources": [...],
      "status": "available"
    }
  ]
}

does not download file contents yet.

Add an agent-callable command:
```
node .agent/dist/cli/attachment.js get --id attachment-001
```
This command should:
- accept only manifest IDs, not raw URLs,
- reject IDs not in the current-run manifest,
- enforce max count, max size, timeout, and current repo/target constraints,
- download to $RUNNER_TEMP/agent-attachments/,
- update the manifest with localPath, size, content type, and errors.
Update the prompt to say:
- use attachment.js list to inspect available attachments,
- use attachment.js get --id ... to fetch one,
- do not fetch original github.com/user-attachments URLs directly.

The hard part: today run-agent-task passes INPUT_GITHUB_TOKEN into the agent step, and buildSharedEnv() mirrors it to GH_TOKEN/GITHUB_TOKEN: runtime-env.ts#L29-L35. So a CLI wrapper is not a true security boundary while the agent also has raw GitHub auth. It is still useful for policy and auditability, but not fully “protected”.

So I’d split implementation into two phases:

Phase 1: implement on-demand download by manifest ID, with limits and logging. This replaces preloaded file bytes.
Phase 2: make it truly protected by changing the auth boundary: do not pass raw GitHub token into the agent, or introduce a local broker process that holds the token and only serves allowlisted attachment IDs.

My recommendation: create a new branch from main for this direction, not continue PR #428 as-is. PR #428 is the “pre-download manifest” design; this would be a cleaner alternate design: pre-index only, download on demand by allowlisted ID.

codex | gpt-5.5 | high | Macmini-runner-1

lolipopshock · 2026-06-25T17:29:27Z

Add an attachment indexer:
scans only the current target thread,

records discovered github.com/user-attachments/... URLs,
writes a manifest like:
{
  "attachments": [
    {
      "id": "attachment-001",
      "url": "...",
      "sources": [...],
      "status": "available"
    }
  ]
}
does not download file contents yet.

No I don't think we should do this. We just simplify this:

build a command, takes in any given a github url, and then it will parse it and download it.

@sepo-agent /answer

sepo-agent-app · 2026-06-25T20:10:16Z

Yes, we can implement this as a new direction from main: no preloaded files, no indexer, just an agent-callable download command.

I would still narrow “any given GitHub URL” to any GitHub user-attachment URL:

node .agent/dist/cli/download-github-attachment.js \
  --url "https://github.com/user-attachments/files/..."

The command should:

accept a raw URL from the agent,
validate https://github.com/user-attachments/(files|assets)/...,
reject non-GitHub and non-attachment URLs,
download using INPUT_GITHUB_TOKEN / GH_TOKEN,
enforce max file size, timeout, and safe filename handling,
write to $RUNNER_TEMP/agent-attachments/,
print the local path and metadata.

I would not make it a general authenticated github.com downloader. That becomes too close to “authenticated curl for anything GitHub can access.” Attachment URLs are the real problem, so the command should be attachment-specific.

Current architecture fit:

run-agent-task already passes INPUT_GITHUB_TOKEN into the agent step: run-agent-task/action.yml#L342-L351
buildSharedEnv() mirrors that into GH_TOKEN / GITHUB_TOKEN: runtime-env.ts#L29-L35
the base prompt already documents token handling boundaries: _base.md#L28-L31

So this version is simpler than PR #428:

Add .agent/src/cli/download-github-attachment.ts.
Add a small shared helper for URL validation, filename sanitization, fetch timeout, and size cap.
Update _base.md to tell agents: use this command for github.com/user-attachments links; do not use raw curl/browser fetches for private attachments.
Add tests for valid attachment URLs, rejected URLs, missing token, timeout, oversized response, filename handling, and successful download.
Update docs for the command.

I’d close or supersede PR #428 and branch from main for this alternate design. It gives us the behavior you want without scanning all past comments and without pre-downloading anything.

codex | gpt-5.5 | high | Macmini-runner-3

Download GitHub attachments before agent runs

487b2b4

sepo-agent-app Bot added the agent Handled by the agent label Jun 19, 2026

sepo-agent-app Bot mentioned this pull request Jun 19, 2026

Handle comment files in private repos #426

Open

This comment has been minimized.

Sign in to view

sepo-agent-app Bot commented Jun 19, 2026

View reviewed changes

Uh oh!

Conversation

sepo-agent-app Bot commented Jun 19, 2026

Summary

Tests

Uh oh!

This comment has been minimized.

sepo-agent-app Bot commented Jun 19, 2026

Rubrics Review

Notes

Findings

Final Rubric Verdict

Uh oh!

sepo-agent-app Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

sepo-agent-app Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

sepo-agent-app Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

sepo-agent-app Bot commented Jun 19, 2026

AI Review Synthesis

Summary of PR/Issue

Review

Progress

Issue Details

Recommended Next Step

Final Verdict

Action Items

Uh oh!

sepo-agent-app Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sepo-agent-app Bot commented Jun 19, 2026

Uh oh!

sepo-agent-app Bot commented Jun 19, 2026

Uh oh!

lolipopshock commented Jun 20, 2026

Uh oh!

sepo-agent-app Bot commented Jun 20, 2026

Uh oh!

lolipopshock commented Jun 25, 2026

Uh oh!

sepo-agent-app Bot commented Jun 25, 2026

Uh oh!

lolipopshock commented Jun 25, 2026

Uh oh!

sepo-agent-app Bot commented Jun 25, 2026

Uh oh!

lolipopshock commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sepo-agent-app Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sepo-agent-app Bot commented Jun 19, 2026 •

edited

Loading

lolipopshock commented Jun 25, 2026 •

edited

Loading