Skip to content

feat: add parallel runtime controls and recovery#5

Draft
dickymoore wants to merge 6 commits into
NathanJ60:mainfrom
dickymoore:feat/parallel-runtime-recovery
Draft

feat: add parallel runtime controls and recovery#5
dickymoore wants to merge 6 commits into
NathanJ60:mainfrom
dickymoore:feat/parallel-runtime-recovery

Conversation

@dickymoore
Copy link
Copy Markdown
Contributor

This is the second PR in the parallel execution stack. It is opened as a draft to avoid accidental merge before the MVP branch lands.

Please review/merge order:

  1. fix: codex loop robustness #2: Codex bug fixes.
  2. feat: add parallel story execution MVP #4: parallel execution MVP.
  3. This PR: runtime controls and recovery.
  4. Safety/modularization follow-up.

Scope:

  • Adds controller PID/control-file reporting.
  • Adds runtime control commands: pause, resume, drain, and stop.
  • Adds graceful signal handling for long-running controller sessions.
  • Adds worker/provider idle timeout handling so stalled work does not block the full run forever.
  • Adds retry/salvage context from previous kept worker attempts.
  • Preserves sprint tracker formatting during status updates.
  • Adds script snapshotting so edits to the repo copy do not affect a live controller/worker run.

Tested:

  • bash -n on core, wrappers, and installer.
  • Wrapper --help smoke test.
  • Fake Codex parallel smoke test with RALPH_CONCURRENCY=2 in a temporary git project: two ready-for-dev stories ran through worker worktrees, fake dev/code-review workflows completed, controller integrated both worker outputs, and both stories ended as done in sprint status.

No live Claude/Codex agent workflow was invoked; the provider was faked to avoid spending real model calls during branch validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant