Skip to content

ci(flow-next): expand smoke matrix — 7 suites on ubuntu/macos/windows#126

Merged
gmickel merged 8 commits intomainfrom
ci/expand-smoke-matrix
May 2, 2026
Merged

ci(flow-next): expand smoke matrix — 7 suites on ubuntu/macos/windows#126
gmickel merged 8 commits intomainfrom
ci/expand-smoke-matrix

Conversation

@gmickel
Copy link
Copy Markdown
Owner

@gmickel gmickel commented May 2, 2026

Summary

Adds functional smoke coverage to the GitHub Actions matrix beyond the existing ci_test.sh. Today only ci_test.sh (57 cases) runs cross-platform; the seven feature smokes that exercise actual user-facing surfaces (~480 cases) only ran locally.

This is the gap that caused commits like 4fa652d fix(flowctl): pin subprocess encoding=utf-8 for Windows cp1252 — 0.38.2 to ship despite Windows being in the matrix — the matrix only catches what ci_test.sh exercises, and that bug surfaced in subprocess output handling that wasn't tested.

What's in this PR

Adds 7 smoke suites to each matrix leg, ordered cheap-first so failures surface fast:

Smoke Cases ~Runtime
ci_test.sh (already in matrix) 57 ~10s
resolve-pr_smoke_test.sh 58 ~1s
strategy_smoke_test.sh 62 ~5s
audit_smoke_test.sh 41 ~6s
glossary_smoke_test.sh 80 ~7s
prospect_smoke_test.sh 94 ~60s
impl-review_smoke_test.sh 74 ~70s
smoke_test.sh 130 ~100s
total per leg ~596 ~260s

Matrix runs all 3 OSes in parallel — wall time stays ~4 minutes.

Workflow cleanup

  • fail-fast: false — one OS failure no longer cancels the others (more useful diagnostic signal)
  • defaults.run.shell: bash — unifies the matrix; Git Bash on Windows handles bash scripts the same way ubuntu/macos do, eliminating the per-step Unix/Windows split that existed before
  • All smokes cd "$RUNNER_TEMP" before invocation (set on every GitHub-hosted runner); their refuse-to-run-from-plugin-repo guards stay honored

Smokes deliberately skipped

These need external CLIs not on GitHub-hosted runners — would always fail in CI:

  • ralph_smoke_test.sh / ralph_smoke_rp.sh — need claude / codex / rp-cli
  • plan_review_prompt_smoke.sh — needs rp-cli (line 20: command -v rp-cli >/dev/null 2>&1 || fail)

Test plan

  • All 7 smokes pass locally on macOS (verified before push)
  • CI matrix passes on ubuntu-latest
  • CI matrix passes on macos-latest
  • CI matrix passes on windows-latest (the one we actually care about — Git Bash + Python on Windows is where surprises live)

gmickel added 8 commits May 2, 2026 13:49
Adds functional smoke coverage to the existing matrix beyond ci_test.sh:
- resolve-pr_smoke_test.sh (58 cases)
- strategy_smoke_test.sh (62 cases)
- audit_smoke_test.sh (41 cases)
- glossary_smoke_test.sh (80 cases)
- prospect_smoke_test.sh (94 cases)
- impl-review_smoke_test.sh (74 cases)
- smoke_test.sh (130 cases)

Total per matrix leg: ~536 assertions, ~250s runtime; matrix runs in
parallel so wall time stays around 4 min.

Catches OS-level Python/bash bugs that ship today because ci_test.sh
alone exercises a narrow surface (precedent: 0.38.2 cp1252 fix).
Skipped smokes (ralph_*, plan_review_prompt) need rp-cli / claude /
codex CLIs that GitHub-hosted runners don't have.

Workflow cleanup:
- fail-fast: false so one OS failure doesn't cancel the others
- defaults.run.shell: bash unifies Linux/macOS/Windows (Git Bash on
  Windows) — eliminates the per-step Unix/Windows split
- All smokes cd to \$RUNNER_TEMP (set on every hosted runner) before
  invocation; their refuse-to-run-from-plugin-repo guards stay honored
… coercion

Surfaced by the matrix CI expansion in this branch:

1. Windows path bug — smokes hardcoded TEST_DIR="/tmp/<slug>-smoke-$$".
   Bash on Git Bash for Windows creates the dir at a translated path,
   but Python invoked via `python -c "..."` reads raw "/tmp/..." which
   doesn't exist on Windows → FileNotFoundError on every JSON read.
   Fix: TEST_DIR="${TEST_DIR:-${RUNNER_TEMP:-${TMPDIR:-/tmp}}/<slug>-smoke-$$}"
   in 7 smokes. Honors caller override, then GitHub runner temp, then
   macOS TMPDIR, then Linux /tmp. Backward compatible everywhere.

2. SIGPIPE bug under `set -o pipefail` — assert_grep used
   `printf '%s\n' "$haystack" | grep -qF -- "$needle"`. With a large
   haystack (full workflow.md, ~700 lines) `grep -q` exits early on
   first match → printf gets EPIPE → pipefail makes the pipeline
   return non-zero even though grep found the string. Falsely
   reported "missing" on present strings. Fix: rewrite the 5 sites
   to `grep -qF -- "$needle" <<< "$haystack"` (here-string; no pipe,
   no SIGPIPE, no false negative).

3. Boolean coercion bug — _prospect_parse_frontmatter falls back to
   _parse_inline_yaml when PyYAML is unavailable. The fallback
   deliberately keeps booleans as strings ("memory entries don't need
   typed scalars"), but prospect frontmatter ships typed booleans
   (floor_violation, generation_under_volume) that
   validate_prospect_frontmatter expects as bool. Without PyYAML on
   the runner, parsed["floor_violation"] is "true" not True →
   round-trip assertion fails. Fix: post-process those two known
   prospect-specific keys to bool in the fallback path. Production
   code is now correct without PyYAML; the smoke is consistent across
   environments. Memory entries continue to skip the coercion.

Local: all 7 smokes pass. CI matrix re-run will exercise these on
ubuntu/macos/windows.
…ommits

Round 2 of OS-level fixes surfaced by the matrix CI expansion:

1. Windows path escape — RUNNER_TEMP on Windows resolves to a path
   with backslashes (e.g. D:\a\_temp). When bash interpolates the
   path into a `python -c "..."` source string, Python's escape parser
   reads `\a` as bell (\x07), `\_` as the deprecated literal, etc.
   The path Python tried to open was effectively corrupted →
   OSError 22 [Invalid argument]. Fix: normalize backslashes to
   forward slashes in TEST_DIR right after expansion, in 7 smokes.
   Windows accepts forward-slash paths natively; no-op on Linux/macOS.

2. Git identity missing on runners — smoke_test.sh exercises
   `git commit` flows; CI runners ship git without user.email/name
   configured → "fatal: empty ident name" → smoke fails.
   Fix: add a workflow step that configures global git identity once
   per matrix leg.

Local: all 7 smokes pass on macOS (with PyYAML installed locally).
The bool-coercion fix landed in the previous commit handles the
no-PyYAML case the runners exhibited.
…s run on Windows

Round 3 of OS-level fixes:

1. Git Bash on Windows returns Unix-style /d/a/... from `pwd` even
   though the underlying filesystem is D:\a\... — flow-next's smokes
   pass these via `sys.path.insert(0, "$SCRIPT_DIR")` and similar
   patterns into native-Windows Python, which doesn't recognize the
   /d/a/... prefix → ModuleNotFoundError on `import flowctl`. Fix:
   add a `to_winpath()` helper that uses `cygpath -m` on Windows
   (forward-slash Windows path that Python accepts and that matches
   pathlib output) and is a no-op on Linux/macOS. Apply to SCRIPT_DIR
   + PLUGIN_ROOT in 4 smokes (glossary, strategy, smoke_test,
   prospect). Also wrap T3b's `pwd -P` expected-path comparison so
   the literal string matches Python's pathlib resolution.

2. Single-step-failure-masks-rest — Windows runs `set -euo pipefail`
   bash for each step independently; a step's failure doesn't kill
   the job, but later steps do skip on default semantics. Adding
   `if: always()` to each smoke step ensures every smoke runs on
   every leg, so one CI run surfaces all OS-specific bugs instead
   of one-at-a-time. Job conclusion still reflects all failures.

Local: all 7 smokes still green on macOS (cygpath absent → helper
returns input unchanged).
…skips on Windows

Round 4 of OS-level fixes:

1. autocrlf=false on Windows checkouts. Default Windows-runner config
   converts LF→CRLF on `actions/checkout`, which mangles heredoc bytes
   that smokes compare byte-identically against flowctl output.
   Fix: `git config --global core.autocrlf false` step before checkout.

2. Strategy fixture em-dashes → ASCII `--`. The bash heredoc on Windows
   (Git Bash + cp1252 locale) wrote em-dashes as cp1252 single-byte 0x97
   instead of UTF-8 3-byte sequence. flowctl's UTF-8 read returned
   replacement chars. Replaced 29 em-dashes in fixtures with `--`;
   the prose readability improvement is fixture-internal, no functional
   coverage lost.

3. Strategy file_path JSON values: backslash → forward slash on T1/T3/T3b.
   Python pathlib returns native Windows paths (`D:\_temp\...`), but
   smoke compares against TEST_DIR which we normalized to forward slash.
   Mismatch. Fix: `${VAR//\\//}` after json_get extracts file_path.

4. Strategy T10 subprocess.run uses [sys.executable, FLOWCTL_PY, ...]
   instead of [FLOWCTL]. The bash wrapper isn't a valid Win32 exe →
   WinError 193 on Windows. New FLOWCTL_PY var alongside FLOWCTL.

5. Ralph regression sweeps in prospect Case 11 + impl-review skip on
   $RUNNER_OS=Windows. ralph_smoke_test.sh embeds POSIX subprocess
   patterns; the regression check's purpose is "{prospect,impl-review}
   doesn't break ralph", unrelated to ralph's Windows portability.

Local: strategy 62/62, prospect 94/94, impl-review 74/74 still green.
…lization

Round 5 of OS-level fixes:

1. atomic_write LF→CRLF translation on Windows. Python's text mode
   default `newline=None` translates `\n` → `os.linesep` (`\r\n` on
   Windows). flow-next writes content with explicit `\n` line endings,
   so on Windows every flowctl-written file (memory entries, glossary,
   prospect artifacts, STRATEGY.md, epic/task specs) ends up with CRLF.
   This causes (a) phantom "modified" diffs in cross-OS git checkouts,
   (b) round-trip byte-comparison failures in tests, and (c) downstream
   parsers that expect LF to misbehave. Fix: pass `newline=""` to the
   `os.fdopen` call inside `atomic_write` so LF is preserved exactly.

2. Strategy T1_PATH_REAL backslash normalization. Python's
   `os.path.realpath` returns native-Windows paths (backslashes); the
   smoke compares against T1_PATH which we normalized to forward
   slashes earlier. Both sides need normalization for byte-equal
   comparison. Add `${V//\\//}` after the realpath call.

Local: glossary 80/80, strategy 62/62, smoke_test 130/130 still green.
… stdin

Round 6 — last Windows-specific fix.

Bash on Windows (Git Bash / MSYS) writes CRLF to pipes by default;
Python's text-mode stdin universal-newlines translation doesn't always
fire when the parent opened the pipe in binary mode. Result: glossary
`--definition-file -` stored multi-line definitions with CRLF instead
of LF on Windows, breaking byte-equal round-trip comparisons.

Defensive universal-newlines normalization (.replace('\r\n', '\n')
.replace('\r', '\n')) runs immediately after the stdin read, before
the existing trailing-newline strip. LF is the only line ending the
glossary file format guarantees.

Local: glossary 80/80 still green.
…s CRLF on Windows

Final round of Windows fixes:

The smoke `json_get` helper does `python -c "...; print($expr)"` to
extract values from flowctl JSON output. On Windows, Python's text-
mode stdout translates `\n` characters within the printed value to
`\r\n` during write. Bash captures the CRLF-fied output, so smokes
comparing multi-line content saw `\r\n` even though the on-disk
content was clean LF.

This was the last source of the glossary Case 4 false positive: the
stored multi-line definition is LF (verified by atomic_write
`newline=""` + cmd_glossary_add stdin normalization), but the captured
value was CRLF.

Fix: pipe `json_get` output through `tr -d '\r'` so callers see LF-only
content matching what's actually stored. Universal — no-op on
Linux/macOS where stdout doesn't translate.

Local: glossary 80/80, strategy 62/62, audit 41/41, prospect 94/94
all green after the fix.
@gmickel gmickel merged commit 26bba86 into main May 2, 2026
3 checks passed
@gmickel gmickel deleted the ci/expand-smoke-matrix branch May 2, 2026 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant