ci(flow-next): expand smoke matrix — 7 suites on ubuntu/macos/windows by gmickel · Pull Request #126 · gmickel/flow-next

gmickel · 2026-05-02T11:49:58Z

Summary

Adds functional smoke coverage to the GitHub Actions matrix beyond the existing ci_test.sh. Today only ci_test.sh (57 cases) runs cross-platform; the seven feature smokes that exercise actual user-facing surfaces (~480 cases) only ran locally.

This is the gap that caused commits like 4fa652d fix(flowctl): pin subprocess encoding=utf-8 for Windows cp1252 — 0.38.2 to ship despite Windows being in the matrix — the matrix only catches what ci_test.sh exercises, and that bug surfaced in subprocess output handling that wasn't tested.

What's in this PR

Adds 7 smoke suites to each matrix leg, ordered cheap-first so failures surface fast:

Smoke	Cases	~Runtime
`ci_test.sh` (already in matrix)	57	~10s
`resolve-pr_smoke_test.sh`	58	~1s
`strategy_smoke_test.sh`	62	~5s
`audit_smoke_test.sh`	41	~6s
`glossary_smoke_test.sh`	80	~7s
`prospect_smoke_test.sh`	94	~60s
`impl-review_smoke_test.sh`	74	~70s
`smoke_test.sh`	130	~100s
total per leg	~596	~260s

Matrix runs all 3 OSes in parallel — wall time stays ~4 minutes.

Workflow cleanup

fail-fast: false — one OS failure no longer cancels the others (more useful diagnostic signal)
defaults.run.shell: bash — unifies the matrix; Git Bash on Windows handles bash scripts the same way ubuntu/macos do, eliminating the per-step Unix/Windows split that existed before
All smokes cd "$RUNNER_TEMP" before invocation (set on every GitHub-hosted runner); their refuse-to-run-from-plugin-repo guards stay honored

Smokes deliberately skipped

These need external CLIs not on GitHub-hosted runners — would always fail in CI:

ralph_smoke_test.sh / ralph_smoke_rp.sh — need claude / codex / rp-cli
plan_review_prompt_smoke.sh — needs rp-cli (line 20: command -v rp-cli >/dev/null 2>&1 || fail)

Test plan

All 7 smokes pass locally on macOS (verified before push)
CI matrix passes on ubuntu-latest
CI matrix passes on macos-latest
CI matrix passes on windows-latest (the one we actually care about — Git Bash + Python on Windows is where surprises live)

Adds functional smoke coverage to the existing matrix beyond ci_test.sh: - resolve-pr_smoke_test.sh (58 cases) - strategy_smoke_test.sh (62 cases) - audit_smoke_test.sh (41 cases) - glossary_smoke_test.sh (80 cases) - prospect_smoke_test.sh (94 cases) - impl-review_smoke_test.sh (74 cases) - smoke_test.sh (130 cases) Total per matrix leg: ~536 assertions, ~250s runtime; matrix runs in parallel so wall time stays around 4 min. Catches OS-level Python/bash bugs that ship today because ci_test.sh alone exercises a narrow surface (precedent: 0.38.2 cp1252 fix). Skipped smokes (ralph_*, plan_review_prompt) need rp-cli / claude / codex CLIs that GitHub-hosted runners don't have. Workflow cleanup: - fail-fast: false so one OS failure doesn't cancel the others - defaults.run.shell: bash unifies Linux/macOS/Windows (Git Bash on Windows) — eliminates the per-step Unix/Windows split - All smokes cd to \$RUNNER_TEMP (set on every hosted runner) before invocation; their refuse-to-run-from-plugin-repo guards stay honored

… coercion Surfaced by the matrix CI expansion in this branch: 1. Windows path bug — smokes hardcoded TEST_DIR="/tmp/<slug>-smoke-$$". Bash on Git Bash for Windows creates the dir at a translated path, but Python invoked via `python -c "..."` reads raw "/tmp/..." which doesn't exist on Windows → FileNotFoundError on every JSON read. Fix: TEST_DIR="${TEST_DIR:-${RUNNER_TEMP:-${TMPDIR:-/tmp}}/<slug>-smoke-$$}" in 7 smokes. Honors caller override, then GitHub runner temp, then macOS TMPDIR, then Linux /tmp. Backward compatible everywhere. 2. SIGPIPE bug under `set -o pipefail` — assert_grep used `printf '%s\n' "$haystack" | grep -qF -- "$needle"`. With a large haystack (full workflow.md, ~700 lines) `grep -q` exits early on first match → printf gets EPIPE → pipefail makes the pipeline return non-zero even though grep found the string. Falsely reported "missing" on present strings. Fix: rewrite the 5 sites to `grep -qF -- "$needle" <<< "$haystack"` (here-string; no pipe, no SIGPIPE, no false negative). 3. Boolean coercion bug — _prospect_parse_frontmatter falls back to _parse_inline_yaml when PyYAML is unavailable. The fallback deliberately keeps booleans as strings ("memory entries don't need typed scalars"), but prospect frontmatter ships typed booleans (floor_violation, generation_under_volume) that validate_prospect_frontmatter expects as bool. Without PyYAML on the runner, parsed["floor_violation"] is "true" not True → round-trip assertion fails. Fix: post-process those two known prospect-specific keys to bool in the fallback path. Production code is now correct without PyYAML; the smoke is consistent across environments. Memory entries continue to skip the coercion. Local: all 7 smokes pass. CI matrix re-run will exercise these on ubuntu/macos/windows.

…ommits Round 2 of OS-level fixes surfaced by the matrix CI expansion: 1. Windows path escape — RUNNER_TEMP on Windows resolves to a path with backslashes (e.g. D:\a\_temp). When bash interpolates the path into a `python -c "..."` source string, Python's escape parser reads `\a` as bell (\x07), `\_` as the deprecated literal, etc. The path Python tried to open was effectively corrupted → OSError 22 [Invalid argument]. Fix: normalize backslashes to forward slashes in TEST_DIR right after expansion, in 7 smokes. Windows accepts forward-slash paths natively; no-op on Linux/macOS. 2. Git identity missing on runners — smoke_test.sh exercises `git commit` flows; CI runners ship git without user.email/name configured → "fatal: empty ident name" → smoke fails. Fix: add a workflow step that configures global git identity once per matrix leg. Local: all 7 smokes pass on macOS (with PyYAML installed locally). The bool-coercion fix landed in the previous commit handles the no-PyYAML case the runners exhibited.

…s run on Windows Round 3 of OS-level fixes: 1. Git Bash on Windows returns Unix-style /d/a/... from `pwd` even though the underlying filesystem is D:\a\... — flow-next's smokes pass these via `sys.path.insert(0, "$SCRIPT_DIR")` and similar patterns into native-Windows Python, which doesn't recognize the /d/a/... prefix → ModuleNotFoundError on `import flowctl`. Fix: add a `to_winpath()` helper that uses `cygpath -m` on Windows (forward-slash Windows path that Python accepts and that matches pathlib output) and is a no-op on Linux/macOS. Apply to SCRIPT_DIR + PLUGIN_ROOT in 4 smokes (glossary, strategy, smoke_test, prospect). Also wrap T3b's `pwd -P` expected-path comparison so the literal string matches Python's pathlib resolution. 2. Single-step-failure-masks-rest — Windows runs `set -euo pipefail` bash for each step independently; a step's failure doesn't kill the job, but later steps do skip on default semantics. Adding `if: always()` to each smoke step ensures every smoke runs on every leg, so one CI run surfaces all OS-specific bugs instead of one-at-a-time. Job conclusion still reflects all failures. Local: all 7 smokes still green on macOS (cygpath absent → helper returns input unchanged).

…skips on Windows Round 4 of OS-level fixes: 1. autocrlf=false on Windows checkouts. Default Windows-runner config converts LF→CRLF on `actions/checkout`, which mangles heredoc bytes that smokes compare byte-identically against flowctl output. Fix: `git config --global core.autocrlf false` step before checkout. 2. Strategy fixture em-dashes → ASCII `--`. The bash heredoc on Windows (Git Bash + cp1252 locale) wrote em-dashes as cp1252 single-byte 0x97 instead of UTF-8 3-byte sequence. flowctl's UTF-8 read returned replacement chars. Replaced 29 em-dashes in fixtures with `--`; the prose readability improvement is fixture-internal, no functional coverage lost. 3. Strategy file_path JSON values: backslash → forward slash on T1/T3/T3b. Python pathlib returns native Windows paths (`D:\_temp\...`), but smoke compares against TEST_DIR which we normalized to forward slash. Mismatch. Fix: `${VAR//\\//}` after json_get extracts file_path. 4. Strategy T10 subprocess.run uses [sys.executable, FLOWCTL_PY, ...] instead of [FLOWCTL]. The bash wrapper isn't a valid Win32 exe → WinError 193 on Windows. New FLOWCTL_PY var alongside FLOWCTL. 5. Ralph regression sweeps in prospect Case 11 + impl-review skip on $RUNNER_OS=Windows. ralph_smoke_test.sh embeds POSIX subprocess patterns; the regression check's purpose is "{prospect,impl-review} doesn't break ralph", unrelated to ralph's Windows portability. Local: strategy 62/62, prospect 94/94, impl-review 74/74 still green.

…lization Round 5 of OS-level fixes: 1. atomic_write LF→CRLF translation on Windows. Python's text mode default `newline=None` translates `\n` → `os.linesep` (`\r\n` on Windows). flow-next writes content with explicit `\n` line endings, so on Windows every flowctl-written file (memory entries, glossary, prospect artifacts, STRATEGY.md, epic/task specs) ends up with CRLF. This causes (a) phantom "modified" diffs in cross-OS git checkouts, (b) round-trip byte-comparison failures in tests, and (c) downstream parsers that expect LF to misbehave. Fix: pass `newline=""` to the `os.fdopen` call inside `atomic_write` so LF is preserved exactly. 2. Strategy T1_PATH_REAL backslash normalization. Python's `os.path.realpath` returns native-Windows paths (backslashes); the smoke compares against T1_PATH which we normalized to forward slashes earlier. Both sides need normalization for byte-equal comparison. Add `${V//\\//}` after the realpath call. Local: glossary 80/80, strategy 62/62, smoke_test 130/130 still green.

… stdin Round 6 — last Windows-specific fix. Bash on Windows (Git Bash / MSYS) writes CRLF to pipes by default; Python's text-mode stdin universal-newlines translation doesn't always fire when the parent opened the pipe in binary mode. Result: glossary `--definition-file -` stored multi-line definitions with CRLF instead of LF on Windows, breaking byte-equal round-trip comparisons. Defensive universal-newlines normalization (.replace('\r\n', '\n') .replace('\r', '\n')) runs immediately after the stdin read, before the existing trailing-newline strip. LF is the only line ending the glossary file format guarantees. Local: glossary 80/80 still green.

…s CRLF on Windows Final round of Windows fixes: The smoke `json_get` helper does `python -c "...; print($expr)"` to extract values from flowctl JSON output. On Windows, Python's text- mode stdout translates `\n` characters within the printed value to `\r\n` during write. Bash captures the CRLF-fied output, so smokes comparing multi-line content saw `\r\n` even though the on-disk content was clean LF. This was the last source of the glossary Case 4 false positive: the stored multi-line definition is LF (verified by atomic_write `newline=""` + cmd_glossary_add stdin normalization), but the captured value was CRLF. Fix: pipe `json_get` output through `tr -d '\r'` so callers see LF-only content matching what's actually stored. Universal — no-op on Linux/macOS where stdout doesn't translate. Local: glossary 80/80, strategy 62/62, audit 41/41, prospect 94/94 all green after the fix.

gmickel added 8 commits May 2, 2026 13:49

gmickel merged commit 26bba86 into main May 2, 2026
3 checks passed

gmickel deleted the ci/expand-smoke-matrix branch May 2, 2026 13:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci(flow-next): expand smoke matrix — 7 suites on ubuntu/macos/windows#126

ci(flow-next): expand smoke matrix — 7 suites on ubuntu/macos/windows#126
gmickel merged 8 commits intomainfrom
ci/expand-smoke-matrix

gmickel commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

gmickel commented May 2, 2026

Summary

What's in this PR

Workflow cleanup

Smokes deliberately skipped

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant