Skip to content

Ensure Galley stages executor-produced worktree changes before supervisor review so newly-created files are visible in review evidence even when the executor did not run git add.#69

Merged
shinpr merged 3 commits into
mainfrom
agent/task-20260525082943-e43eb7-stage-executor-changes-before-supervisor-review
May 25, 2026
Merged

Conversation

@shinpr
Copy link
Copy Markdown
Owner

@shinpr shinpr commented May 25, 2026

Goal

Ensure Galley stages executor-produced worktree changes before supervisor review so newly-created files are visible in review evidence even when the executor did not run git add.

Acceptance Criteria

  • AC1 After an executor attempt completes and before supervisor review evidence is captured, Galley stages executor-produced worktree changes so newly-created untracked files are included in the attempt diff.
    • Verification: Add daemon or workspace regression coverage where a fake executor creates a new file without running git add. Assert the attempt diff.patch contains a new-file diff for that path and includes the file contents.
    • Status: satisfied
  • AC2 Supervisor evidence for an attempt uses the staged, reviewable diff produced by Galley, so a task is not reviewed as an empty submitted diff when the only change is a new file created by the executor.
    • Verification: Add focused daemon coverage asserting the supervisor request/evidence includes the new file diff when the executor leaves the file untracked.
    • Status: satisfied
  • AC3 Review-time staging does not require the executor to run git add and remains compatible with accepted-task finalization; accepted new files are committed through the existing Galley finalization path.
    • Verification: Add or update a daemon acceptance test with a fake accepted supervisor verdict where the executor creates an untracked new file. Assert Galley creates the final commit and the committed tree contains the new file.
    • Status: satisfied
  • AC4 Review-time staging preserves context-only task input behavior; files declared in task.files with commit:false are not accidentally committed as part of accepted-task finalization.
    • Verification: Add regression coverage showing commit:false input destinations remain excluded from the final commit after review-time staging.
    • Status: satisfied
  • AC5 Review-time staging preserves forbidden-path enforcement; changes under task.scope.forbidden_paths are still detected before accepted finalization completes.
    • Verification: Add regression coverage where a staged executor change under task.scope.forbidden_paths causes accepted finalization to fail with the existing forbidden-path error.
    • Status: satisfied
  • AC6 Galley records review-time git add evidence when it stages executor output and surfaces a clear attempt error if staging fails instead of sending an empty diff to supervisor.
    • Verification: Assert the run artifact directory contains command/result evidence for the review-time git add step. Add a failure-path test with a fake git/add failure and assert the attempt reports a staging-related error before supervisor review.
    • Status: satisfied

Final Verification

  • gofmt -l $(git ls-files '*.go'): passed
  • go build -o /tmp/galley-bin ./cmd/galley: passed
  • go run ./cmd/galley task validate <tmp afk-task with workspace path substituted>: passed
  • go run ./cmd/galley profile validate --kind environment examples/environment-local.yaml: passed
  • python3 -m json.tool on schemas/claude-result.schema.json, schemas/supervisor-verdict.schema.json, plugins/galley/skills/galley/references/{task,quality,environment}.schema.json, plugins/galley/.claude-plugin/plugin.json, plugins/galley/.codex-plugin/plugin.json, .claude-plugin/marketplace.json, .agents/plugins/marketplace.json: passed
  • gofmt -l internal/daemon/review_staging_test.go internal/daemon/loop.go internal/vcs/vcs.go: passed
  • gofmt -l .: passed
  • go test ./internal/daemon ./internal/vcs ./internal/workspace: passed
  • test -z "$(find . -name '*.go' -not -path './.git/*' -print | xargs gofmt -l)": passed
  • go test ./...: passed
  • go build -o /tmp/galley ./cmd/galley: passed
  • go run ./cmd/galley schema check: passed
  • go run ./cmd/galley task validate examples/afk-task.yaml: passed
  • go run ./cmd/galley profile validate --kind quality examples/quality-default.yaml: passed
  • python3 -m json.tool schemas/claude-result.schema.json >/dev/null: passed
  • ./scripts/smoke-local.sh: passed

Key Decisions

  • D1 Who should make executor-created files reviewable for supervisor evaluation? -> Galley stages executor-produced worktree changes after executor completion and before supervisor evidence capture.
    • Rationale: The executor should not be responsible for git add; Galley owns review evidence generation and final commit orchestration.
    • Reversibility: high
  • claude-decision-2 Fix the staging-failure path in loop.go (swallow the error in Run) or fix the test expectation? -> Fix the test expectation to expect Run to return a non-nil error.
    • Rationale: The daemon's existing terminal failure modes (e.g. TestRunOnceFailsWhenPRBaseRefMissing) all propagate the primary error out of Run via taskstate.FailMove; swallowing the review-staging error would diverge from that contract and reduce caller visibility of the failure. The required invariants — attempt error phase=review_staging/kind=review_staging_failed, no supervisor invocation, task moved to tasks/failed — are still asserted.
    • Reversibility: high
  • claude-decision-3 Should daemon/loop.go or vcs/vcs.go be modified during this rerun? -> Leave them untouched.
    • Rationale: The directive explicitly says not to change the already-correct daemon/vcs behavior unless rerunning the checks exposes a real failure, and gofmt plus all three go test packages passed without surfacing any failure.
    • Reversibility: high
  • claude-decision-4 How should review-time staging exclude commit:false input destinations without breaking the existing finalization cleanup path? -> Pass each commit:false destination as a :(exclude,literal) pathspec to git add -A -- . :(exclude,literal) so the destinations are physically kept in the worktree (still readable as context by the executor) but never enter the index, and therefore never appear in any diff surface that builds Evidence.Diff. The existing CleanupNonCommitted hook continues to remove them before commit.
    • Rationale: Pathspec exclusion is atomic with the staging command, leaves no transient git reset state, and produces the exact submitted-diff surface the supervisor reviews. The :literal modifier prevents glob metacharacters in destinations from broadening the exclusion. Preserving the files on disk preserves their role as context input for the executor and keeps the existing finalization invariants intact.
    • Reversibility: high
  • claude-decision-5 Should the seam signature (stageExecutorOutput) include excludePaths, or should the closure read loaded.Files internally? -> Add excludePaths to the seam signature.
    • Rationale: Tests need to inspect or override the exclude list to assert the daemon's policy without spawning git. Keeping excludePaths visible at the seam makes the contract explicit and the failure-path test (which now takes a fourth parameter) documents the seam shape.
    • Reversibility: high

Risks

  • R1 trust-boundary: Review-time staging mutates the Git index before supervisor acceptance, so task input cleanup, forbidden-path checks, and finalization must preserve existing safety boundaries.
    • Mitigation: Add focused tests for commit:false inputs, forbidden paths, staging failure classification, and accepted finalization.

@shinpr shinpr self-assigned this May 25, 2026
@shinpr
Copy link
Copy Markdown
Owner Author

shinpr commented May 25, 2026

/galley

Please revise the review-staging implementation around the submitted artifact set.

Goal:
Galley should stage only the paths that are part of the executor-produced submitted diff. Context-only inputs and other Galley-owned runtime material may remain in the worktree as diagnostic or execution context, but they should not become submitted progress, supervisor review evidence, or accepted work unless the task contract makes them commit candidates.

Implementation direction:

  • Replace the current git add -A plus exclusion approach with an explicit reviewable path set.
  • Build that set from executor-produced worktree changes after the executor attempt.
  • Exclude task.files entries with commit:false from that set.
  • Keep forbidden-path changes visible to the existing safety gates rather than hiding them from review/finalization.
  • Stage only the resulting reviewable paths before capturing the supervisor snapshot.
  • Treat raw worktree dirty state as diagnostic evidence. Compute supervisor-facing dirty/progress signals from the submitted artifact set or submitted diff, not from context-only untracked files.

Acceptance criteria:

  • A fake executor that creates a new untracked file without running git add produces a supervisor diff containing that new file.
  • A fake executor that makes no repository change while a commit:false input file exists does not produce a submitted diff and does not report submitted progress from that input.
  • commit:false inputs remain readable in the worktree for executor context, stay out of supervisor submitted diff evidence, and stay out of the final commit.
  • Forbidden-path executor changes are still detected by the existing finalization/safety checks.
  • Review-staging command evidence remains persisted for success and failure.
  • Add focused tests for the path-set builder, including duplicate paths, empty paths, invalid/locality handling, commit:false exclusions, and the no-change context-only case.

Keep the fix scoped to the review-staging/evidence boundary. The supervisor prompt does not need a new dirty-specific rule for this patch; this is a Galley-owned submitted-artifact selection issue.

@shinpr
Copy link
Copy Markdown
Owner Author

shinpr commented May 25, 2026

Galley requeued task task-20260525082943-e43eb7-stage-executor-changes-before-supervisor-review from this comment.

@shinpr shinpr force-pushed the agent/task-20260525082943-e43eb7-stage-executor-changes-before-supervisor-review branch from f63fe9f to 5e74f9f Compare May 25, 2026 13:28
@shinpr shinpr merged commit f458555 into main May 25, 2026
4 checks passed
@shinpr shinpr deleted the agent/task-20260525082943-e43eb7-stage-executor-changes-before-supervisor-review branch May 25, 2026 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant