Two layered bugs make wave resume unusable for any composition pipeline that uses aggregate or iterate
Empirical baseline
ops-pr-respond-20260427-190848-9617 on PR #1407 ran fetch-pr → parallel-review (6 audits) → merge-findings (aggregate) → triage (failed: schema compile error from Perl-only regex (?!, fixed separately). At failure, triage had already produced a valid triaged-findings.json (19 actionable / 3 deferred / 1 rejected) and merge-findings had written merged-findings.json.
wave resume <run-id> then wave resume <run-id> --force both failed:
Error: pipeline execution failed: ❌ Phase 'triage' failed: cannot resume from 'triage':
prior step 'parallel-review' has no workspace artifacts
Error: pipeline execution failed: step "triage" failed: failed to inject artifacts:
required artifact 'merged-findings' from step 'merge-findings' not found
Bug 1 — aggregate / iterate steps don't register output artifacts
internal/pipeline/composition.go executeAggregate (line 450-492) writes step.Aggregate.Into to disk and stashes the bytes in tmplCtx.SetStepOutput, but never calls store.RegisterArtifact. Compare internal/pipeline/executor.go:4517-4523 for prompt/command steps.
DB inspection of failed run:
sqlite> SELECT step_id, name FROM artifact WHERE run_id='ops-pr-respond-20260427-190848-9617';
fetch-pr|pr-context
triage|triaged-findings
Only 2 artifacts. Should be 3 (also merge-findings:merged-findings). Same gap likely exists for iterate (no collected-output artifact registered) and branch outputs.
Downstream consequences:
inject_artifacts in any step depending on an aggregate output can't find it after restart.
- WebUI OUT pills are blank on aggregate/iterate steps (separate symptom of same bug).
- Resume preflight refuses because the dependency artifact is unregistered.
Bug 2 — resume creates a new workspace path
The failed run wrote everything under .agents/workspaces/ops-pr-respond-20260427-190848-9617/. wave resume keeps the run-id in DB but creates a fresh workspace dir .agents/workspaces/ops-pr-respond-20260427-201303-8495/ (timestamp at resume time). The first step run inside the new workspace (triage) then can't see prior step outputs that live under the original path.
Quote from resume error message:
Inspect workspace artifacts:
ls .agents/workspaces/ops-pr-respond-20260427-201303-8495/
The path printed in the error doesn't match the persisted run-id. Old workspace 190848-9617 still on disk with triaged-findings.json, pr-context.json, etc.
Proposed fixes
Bug 1:
- In
composition.go executeAggregate, after os.WriteFile, call c.store.RegisterArtifact(runID, step.ID, name, outputPath, "json", size). Requires plumbing runID into CompositionExecutor (currently only store is held; runID flows in via Execute somewhere — needs a constructor add or per-Execute call).
- Same fix on
executeIterate for the implicit collected-output array (collectIterateOutputs).
- Backfill: also register on
executeBranch outputs if any.
Bug 2:
- Resume should reuse the original workspace path from
pipeline_run.workspace_path or from the most recent checkpoint's workspace_path.
- The
checkpoint table already has workspace_path per step — resume can read it and feed the parent run executor a workspace override.
- Alternative: symlink
<old-ws>/<step> into the new workspace dir if a per-resume timestamp dir is required for state isolation.
Acceptance
executeAggregate registers an artifact in DB for every successful aggregate step.
executeIterate registers a collected-output artifact (or named per step.ID + "/collected-output").
- WebUI shows OUT pill for aggregate/iterate steps populated.
wave resume <run-id> on a composition pipeline that failed after aggregate/iterate succeeds at finding upstream artifacts — without --force and without re-running prior steps.
- Regression test:
ops-pr-respond on a representative PR, kill the run after merge-findings, wave resume — pipeline picks up at triage reading the registered merged-findings artifact.
Related
Source
Failed run ops-pr-respond-20260427-190848-9617 on PR #1407 (2026-04-27).
Two layered bugs make
wave resumeunusable for any composition pipeline that usesaggregateoriterateEmpirical baseline
ops-pr-respond-20260427-190848-9617on PR #1407 ranfetch-pr→parallel-review(6 audits) →merge-findings(aggregate) →triage(failed: schema compile error from Perl-only regex(?!, fixed separately). At failure,triagehad already produced a validtriaged-findings.json(19 actionable / 3 deferred / 1 rejected) andmerge-findingshad writtenmerged-findings.json.wave resume <run-id>thenwave resume <run-id> --forceboth failed:Bug 1 — aggregate / iterate steps don't register output artifacts
internal/pipeline/composition.goexecuteAggregate(line 450-492) writesstep.Aggregate.Intoto disk and stashes the bytes intmplCtx.SetStepOutput, but never callsstore.RegisterArtifact. Compareinternal/pipeline/executor.go:4517-4523for prompt/command steps.DB inspection of failed run:
Only 2 artifacts. Should be 3 (also
merge-findings:merged-findings). Same gap likely exists foriterate(nocollected-outputartifact registered) andbranchoutputs.Downstream consequences:
inject_artifactsin any step depending on an aggregate output can't find it after restart.Bug 2 — resume creates a new workspace path
The failed run wrote everything under
.agents/workspaces/ops-pr-respond-20260427-190848-9617/.wave resumekeeps the run-id in DB but creates a fresh workspace dir.agents/workspaces/ops-pr-respond-20260427-201303-8495/(timestamp at resume time). The first step run inside the new workspace (triage) then can't see prior step outputs that live under the original path.Quote from resume error message:
The path printed in the error doesn't match the persisted run-id. Old workspace
190848-9617still on disk withtriaged-findings.json,pr-context.json, etc.Proposed fixes
Bug 1:
composition.goexecuteAggregate, afteros.WriteFile, callc.store.RegisterArtifact(runID, step.ID, name, outputPath, "json", size). Requires plumbingrunIDintoCompositionExecutor(currently onlystoreis held; runID flows in viaExecutesomewhere — needs a constructor add or per-Execute call).executeIteratefor the implicitcollected-outputarray (collectIterateOutputs).executeBranchoutputs if any.Bug 2:
pipeline_run.workspace_pathor from the most recent checkpoint'sworkspace_path.checkpointtable already hasworkspace_pathper step — resume can read it and feed the parent run executor a workspace override.<old-ws>/<step>into the new workspace dir if a per-resume timestamp dir is required for state isolation.Acceptance
executeAggregateregisters an artifact in DB for every successful aggregate step.executeIterateregisters acollected-outputartifact (or named perstep.ID + "/collected-output").wave resume <run-id>on a composition pipeline that failed after aggregate/iterate succeeds at finding upstream artifacts — without--forceand without re-running prior steps.ops-pr-respondon a representative PR, kill the run aftermerge-findings,wave resume— pipeline picks up attriagereading the registeredmerged-findingsartifact.Related
Source
Failed run
ops-pr-respond-20260427-190848-9617on PR #1407 (2026-04-27).