Skip to content

fix(pipeline): wave resume broken for composition pipelines — aggregate artifacts unregistered + new workspace path #1434

@nextlevelshit

Description

@nextlevelshit

Two layered bugs make wave resume unusable for any composition pipeline that uses aggregate or iterate

Empirical baseline

ops-pr-respond-20260427-190848-9617 on PR #1407 ran fetch-prparallel-review (6 audits) → merge-findings (aggregate) → triage (failed: schema compile error from Perl-only regex (?!, fixed separately). At failure, triage had already produced a valid triaged-findings.json (19 actionable / 3 deferred / 1 rejected) and merge-findings had written merged-findings.json.

wave resume <run-id> then wave resume <run-id> --force both failed:

Error: pipeline execution failed: ❌ Phase 'triage' failed: cannot resume from 'triage':
  prior step 'parallel-review' has no workspace artifacts
Error: pipeline execution failed: step "triage" failed: failed to inject artifacts:
  required artifact 'merged-findings' from step 'merge-findings' not found

Bug 1 — aggregate / iterate steps don't register output artifacts

internal/pipeline/composition.go executeAggregate (line 450-492) writes step.Aggregate.Into to disk and stashes the bytes in tmplCtx.SetStepOutput, but never calls store.RegisterArtifact. Compare internal/pipeline/executor.go:4517-4523 for prompt/command steps.

DB inspection of failed run:

sqlite> SELECT step_id, name FROM artifact WHERE run_id='ops-pr-respond-20260427-190848-9617';
fetch-pr|pr-context
triage|triaged-findings

Only 2 artifacts. Should be 3 (also merge-findings:merged-findings). Same gap likely exists for iterate (no collected-output artifact registered) and branch outputs.

Downstream consequences:

  • inject_artifacts in any step depending on an aggregate output can't find it after restart.
  • WebUI OUT pills are blank on aggregate/iterate steps (separate symptom of same bug).
  • Resume preflight refuses because the dependency artifact is unregistered.

Bug 2 — resume creates a new workspace path

The failed run wrote everything under .agents/workspaces/ops-pr-respond-20260427-190848-9617/. wave resume keeps the run-id in DB but creates a fresh workspace dir .agents/workspaces/ops-pr-respond-20260427-201303-8495/ (timestamp at resume time). The first step run inside the new workspace (triage) then can't see prior step outputs that live under the original path.

Quote from resume error message:

Inspect workspace artifacts:
  ls .agents/workspaces/ops-pr-respond-20260427-201303-8495/

The path printed in the error doesn't match the persisted run-id. Old workspace 190848-9617 still on disk with triaged-findings.json, pr-context.json, etc.

Proposed fixes

Bug 1:

  • In composition.go executeAggregate, after os.WriteFile, call c.store.RegisterArtifact(runID, step.ID, name, outputPath, "json", size). Requires plumbing runID into CompositionExecutor (currently only store is held; runID flows in via Execute somewhere — needs a constructor add or per-Execute call).
  • Same fix on executeIterate for the implicit collected-output array (collectIterateOutputs).
  • Backfill: also register on executeBranch outputs if any.

Bug 2:

  • Resume should reuse the original workspace path from pipeline_run.workspace_path or from the most recent checkpoint's workspace_path.
  • The checkpoint table already has workspace_path per step — resume can read it and feed the parent run executor a workspace override.
  • Alternative: symlink <old-ws>/<step> into the new workspace dir if a per-resume timestamp dir is required for state isolation.

Acceptance

  • executeAggregate registers an artifact in DB for every successful aggregate step.
  • executeIterate registers a collected-output artifact (or named per step.ID + "/collected-output").
  • WebUI shows OUT pill for aggregate/iterate steps populated.
  • wave resume <run-id> on a composition pipeline that failed after aggregate/iterate succeeds at finding upstream artifacts — without --force and without re-running prior steps.
  • Regression test: ops-pr-respond on a representative PR, kill the run after merge-findings, wave resume — pipeline picks up at triage reading the registered merged-findings artifact.

Related

Source

Failed run ops-pr-respond-20260427-190848-9617 on PR #1407 (2026-04-27).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions