fix: codex loop robustness by dickymoore · Pull Request #2 · NathanJ60/bmad-ralph-loop

dickymoore · 2026-01-29T09:22:34Z

Summary

Use raw yq output when reading statuses/keys so story lookups don't get quoted.
Avoid set -e abort by replacing ((processed++))/((failed++)) with safe arithmetic.

Why

yq was returning quoted keys (e.g., "1-1"), which made status checks fall back to unknown and skipped dev-story.
((processed++)) returns exit status 1 on the first iteration, which terminates the loop under set -e. This caused only one story to run per invocation.

Testing

CODEX_HOME=~/bmad-sandbox/.codex RALPH_PROJECT_ROOT=~/bmad-sandbox RALPH_LOG_DIR=~/bmad-sandbox/logs ./codex-ralph-loop.sh --dry-run

dickymoore · 2026-01-29T10:44:42Z

Hey @NathanJ60. There were a couple of bugs in the codex support so this should fix it. I haven't tested the changes with Claude though as I don't have access. If you're able to have a quick test that'll be ace. Thanks, Dicky

NathanJ60

Hey Dicky! Thanks again for the contributions, really appreciate the effort here.

The two bug fixes (yq -r flag and the ((processed++)) arithmetic issue under set -e) are solid and I'd love to get those merged ASAP. However, this PR has grown quite a bit beyond those fixes — the parallel execution engine is a massive addition (~1900 lines) and I think it deserves its own dedicated PR so it can get proper review.

Could you split this into separate PRs?

Bug fixes only (commits 1-2) — I'll merge those right away
Parallel execution feature — as its own PR, with a few things I'd like to see addressed:
- Modularization: ralph-loop-core.sh grows to 2250+ lines with 40+ functions. It'd be great to split the parallel logic into separate sourced files (lib/parallel.sh, lib/worktree.sh, etc.)
- Security: the source "$previous_result_file" pattern for worker results is risky — if the file gets tampered with between write and read, it's arbitrary code execution. Explicit parsing (grep/cut) would be safer. Same concern with the script snapshot defaulting to /tmp with predictable filenames — mktemp -d with restricted permissions would be better.
- Return code convention: code_review_requires_dev returns 0 for "needs more dev" and 1 for "clean", which inverts the standard bash convention. Easy to introduce bugs when reading the code.
- wait_for_worker_completion signature: 9 nameref parameters is hard to follow — maybe use a result file or associative array instead?

The parallel feature itself is a really cool idea and I'd be happy to work with you on getting it merged once these are addressed. Let me know if you have any questions!

Cheers,
Nathan

dickymoore · 2026-04-16T07:45:00Z

Hi @NathanJ60, and huge apologies. I’ve made a bit of a rookie error here and only just realised this branch was the one backing the open PR, so I accidentally pushed the newer commits onto it.

Completely agree they should be split out. I’ll separate out the original bug fixes so you can merge those, then I’ll put the parallel execution changes into a dedicated PR and work through the review points there. The parallel execution commits were originally only intended to live on my fork while I tested the idea, but if you think the direction is useful I’m happy to open a PR from a cleaned-up branch once I’ve ironed out the remaining issues.

Really appreciate the thoughtful feedback, and I’m glad the first two fixes look good.

Dicky

dickymoore · 2026-04-27T15:45:02Z

Hey Nathan. I hope you're well. I've narrowed this PR back down to the two original Codex bug fixes you called out:

use raw yq output for story/status keys
avoid post-increment arithmetic under set -e

The larger parallel/runtime work has been moved out of this PR path and preserved separately for a dedicated follow-up branch/PR. Local verification passes with bash -n across the shell scripts, and the branch now contains 2 commits, 1 changed file.

dickymoore · 2026-04-27T16:10:38Z

Hi Nathan, quick update on the split.

I’ve kept this PR narrowed to the two Codex bug fixes you said looked good, and opened the parallel work as a separate stacked set so it is easier to review:

fix: codex loop robustness #2 - Codex bug fixes only. This should go first.
feat: add parallel story execution MVP #4 - Parallel execution MVP: controller/worker mode, story worktrees, dependency-aware scheduling, and serial integration.
feat: add parallel runtime controls and recovery #5 - Runtime controls/recovery: pause/resume/drain/stop, idle timeouts, safe run snapshots, and retry/salvage context. I opened this as draft so it does not get merged before feat: add parallel story execution MVP #4.
refactor: modularize parallel runtime and harden safety #6 - Safety/modularization cleanup: no source result.env, private snapshot dirs, standard review-gate return codes, simpler worker wait state, and the core split into lib/runtime.sh, lib/review-loop.sh, and lib/parallel.sh. Also draft until the earlier stack entries land.

Testing I ran on the split branches:

bash -n on the core, wrappers, installer, and lib modules where present.
wrapper --help smoke checks.
snapshot and simulated installed-layout checks for the modularized branch.
worker result parser rejection test for unexpected/malicious result keys.
fake Codex parallel smoke test on each branch with RALPH_CONCURRENCY=2 in a temporary git repo: two ready-for-dev stories ran through worker worktrees, fake dev/code-review workflows completed, the controller integrated both worker outputs, and both stories ended as done in sprint status.

I did not run live Claude/Codex agent workflows for these checks; the provider was faked to avoid burning real model calls during branch validation.

Thanks again for the detailed review notes. I’ve tried to map the follow-up PRs directly to them so you can take them one at a time.

NathanJ60 requested changes Apr 16, 2026

View reviewed changes

dickymoore added 2 commits April 27, 2026 16:43

fix(status): use raw yq output

21abd0c

fix(loop): continue processing stories

8bcbe81

dickymoore force-pushed the fix/codex-support branch from f21731a to 8bcbe81 Compare April 27, 2026 15:44

dickymoore requested a review from NathanJ60 April 27, 2026 15:48

This was referenced Apr 27, 2026

feat: add parallel story execution MVP #4

Open

feat: add parallel runtime controls and recovery #5

Draft

refactor: modularize parallel runtime and harden safety #6

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: codex loop robustness#2

fix: codex loop robustness#2
dickymoore wants to merge 2 commits into
NathanJ60:mainfrom
dickymoore:fix/codex-support

dickymoore commented Jan 29, 2026

Uh oh!

dickymoore commented Jan 29, 2026

Uh oh!

NathanJ60 left a comment

Uh oh!

dickymoore commented Apr 16, 2026

Uh oh!

dickymoore commented Apr 27, 2026 •

edited

Loading

Uh oh!

dickymoore commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dickymoore commented Jan 29, 2026

Summary

Why

Testing

Uh oh!

dickymoore commented Jan 29, 2026

Uh oh!

NathanJ60 left a comment

Choose a reason for hiding this comment

Uh oh!

dickymoore commented Apr 16, 2026

Uh oh!

dickymoore commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dickymoore commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dickymoore commented Apr 27, 2026 •

edited

Loading