ci(flow-next): expand smoke matrix — 7 suites on ubuntu/macos/windows#126
Merged
ci(flow-next): expand smoke matrix — 7 suites on ubuntu/macos/windows#126
Conversation
Adds functional smoke coverage to the existing matrix beyond ci_test.sh: - resolve-pr_smoke_test.sh (58 cases) - strategy_smoke_test.sh (62 cases) - audit_smoke_test.sh (41 cases) - glossary_smoke_test.sh (80 cases) - prospect_smoke_test.sh (94 cases) - impl-review_smoke_test.sh (74 cases) - smoke_test.sh (130 cases) Total per matrix leg: ~536 assertions, ~250s runtime; matrix runs in parallel so wall time stays around 4 min. Catches OS-level Python/bash bugs that ship today because ci_test.sh alone exercises a narrow surface (precedent: 0.38.2 cp1252 fix). Skipped smokes (ralph_*, plan_review_prompt) need rp-cli / claude / codex CLIs that GitHub-hosted runners don't have. Workflow cleanup: - fail-fast: false so one OS failure doesn't cancel the others - defaults.run.shell: bash unifies Linux/macOS/Windows (Git Bash on Windows) — eliminates the per-step Unix/Windows split - All smokes cd to \$RUNNER_TEMP (set on every hosted runner) before invocation; their refuse-to-run-from-plugin-repo guards stay honored
… coercion
Surfaced by the matrix CI expansion in this branch:
1. Windows path bug — smokes hardcoded TEST_DIR="/tmp/<slug>-smoke-$$".
Bash on Git Bash for Windows creates the dir at a translated path,
but Python invoked via `python -c "..."` reads raw "/tmp/..." which
doesn't exist on Windows → FileNotFoundError on every JSON read.
Fix: TEST_DIR="${TEST_DIR:-${RUNNER_TEMP:-${TMPDIR:-/tmp}}/<slug>-smoke-$$}"
in 7 smokes. Honors caller override, then GitHub runner temp, then
macOS TMPDIR, then Linux /tmp. Backward compatible everywhere.
2. SIGPIPE bug under `set -o pipefail` — assert_grep used
`printf '%s\n' "$haystack" | grep -qF -- "$needle"`. With a large
haystack (full workflow.md, ~700 lines) `grep -q` exits early on
first match → printf gets EPIPE → pipefail makes the pipeline
return non-zero even though grep found the string. Falsely
reported "missing" on present strings. Fix: rewrite the 5 sites
to `grep -qF -- "$needle" <<< "$haystack"` (here-string; no pipe,
no SIGPIPE, no false negative).
3. Boolean coercion bug — _prospect_parse_frontmatter falls back to
_parse_inline_yaml when PyYAML is unavailable. The fallback
deliberately keeps booleans as strings ("memory entries don't need
typed scalars"), but prospect frontmatter ships typed booleans
(floor_violation, generation_under_volume) that
validate_prospect_frontmatter expects as bool. Without PyYAML on
the runner, parsed["floor_violation"] is "true" not True →
round-trip assertion fails. Fix: post-process those two known
prospect-specific keys to bool in the fallback path. Production
code is now correct without PyYAML; the smoke is consistent across
environments. Memory entries continue to skip the coercion.
Local: all 7 smokes pass. CI matrix re-run will exercise these on
ubuntu/macos/windows.
…ommits Round 2 of OS-level fixes surfaced by the matrix CI expansion: 1. Windows path escape — RUNNER_TEMP on Windows resolves to a path with backslashes (e.g. D:\a\_temp). When bash interpolates the path into a `python -c "..."` source string, Python's escape parser reads `\a` as bell (\x07), `\_` as the deprecated literal, etc. The path Python tried to open was effectively corrupted → OSError 22 [Invalid argument]. Fix: normalize backslashes to forward slashes in TEST_DIR right after expansion, in 7 smokes. Windows accepts forward-slash paths natively; no-op on Linux/macOS. 2. Git identity missing on runners — smoke_test.sh exercises `git commit` flows; CI runners ship git without user.email/name configured → "fatal: empty ident name" → smoke fails. Fix: add a workflow step that configures global git identity once per matrix leg. Local: all 7 smokes pass on macOS (with PyYAML installed locally). The bool-coercion fix landed in the previous commit handles the no-PyYAML case the runners exhibited.
…s run on Windows Round 3 of OS-level fixes: 1. Git Bash on Windows returns Unix-style /d/a/... from `pwd` even though the underlying filesystem is D:\a\... — flow-next's smokes pass these via `sys.path.insert(0, "$SCRIPT_DIR")` and similar patterns into native-Windows Python, which doesn't recognize the /d/a/... prefix → ModuleNotFoundError on `import flowctl`. Fix: add a `to_winpath()` helper that uses `cygpath -m` on Windows (forward-slash Windows path that Python accepts and that matches pathlib output) and is a no-op on Linux/macOS. Apply to SCRIPT_DIR + PLUGIN_ROOT in 4 smokes (glossary, strategy, smoke_test, prospect). Also wrap T3b's `pwd -P` expected-path comparison so the literal string matches Python's pathlib resolution. 2. Single-step-failure-masks-rest — Windows runs `set -euo pipefail` bash for each step independently; a step's failure doesn't kill the job, but later steps do skip on default semantics. Adding `if: always()` to each smoke step ensures every smoke runs on every leg, so one CI run surfaces all OS-specific bugs instead of one-at-a-time. Job conclusion still reflects all failures. Local: all 7 smokes still green on macOS (cygpath absent → helper returns input unchanged).
…skips on Windows
Round 4 of OS-level fixes:
1. autocrlf=false on Windows checkouts. Default Windows-runner config
converts LF→CRLF on `actions/checkout`, which mangles heredoc bytes
that smokes compare byte-identically against flowctl output.
Fix: `git config --global core.autocrlf false` step before checkout.
2. Strategy fixture em-dashes → ASCII `--`. The bash heredoc on Windows
(Git Bash + cp1252 locale) wrote em-dashes as cp1252 single-byte 0x97
instead of UTF-8 3-byte sequence. flowctl's UTF-8 read returned
replacement chars. Replaced 29 em-dashes in fixtures with `--`;
the prose readability improvement is fixture-internal, no functional
coverage lost.
3. Strategy file_path JSON values: backslash → forward slash on T1/T3/T3b.
Python pathlib returns native Windows paths (`D:\_temp\...`), but
smoke compares against TEST_DIR which we normalized to forward slash.
Mismatch. Fix: `${VAR//\\//}` after json_get extracts file_path.
4. Strategy T10 subprocess.run uses [sys.executable, FLOWCTL_PY, ...]
instead of [FLOWCTL]. The bash wrapper isn't a valid Win32 exe →
WinError 193 on Windows. New FLOWCTL_PY var alongside FLOWCTL.
5. Ralph regression sweeps in prospect Case 11 + impl-review skip on
$RUNNER_OS=Windows. ralph_smoke_test.sh embeds POSIX subprocess
patterns; the regression check's purpose is "{prospect,impl-review}
doesn't break ralph", unrelated to ralph's Windows portability.
Local: strategy 62/62, prospect 94/94, impl-review 74/74 still green.
…lization
Round 5 of OS-level fixes:
1. atomic_write LF→CRLF translation on Windows. Python's text mode
default `newline=None` translates `\n` → `os.linesep` (`\r\n` on
Windows). flow-next writes content with explicit `\n` line endings,
so on Windows every flowctl-written file (memory entries, glossary,
prospect artifacts, STRATEGY.md, epic/task specs) ends up with CRLF.
This causes (a) phantom "modified" diffs in cross-OS git checkouts,
(b) round-trip byte-comparison failures in tests, and (c) downstream
parsers that expect LF to misbehave. Fix: pass `newline=""` to the
`os.fdopen` call inside `atomic_write` so LF is preserved exactly.
2. Strategy T1_PATH_REAL backslash normalization. Python's
`os.path.realpath` returns native-Windows paths (backslashes); the
smoke compares against T1_PATH which we normalized to forward
slashes earlier. Both sides need normalization for byte-equal
comparison. Add `${V//\\//}` after the realpath call.
Local: glossary 80/80, strategy 62/62, smoke_test 130/130 still green.
… stdin
Round 6 — last Windows-specific fix.
Bash on Windows (Git Bash / MSYS) writes CRLF to pipes by default;
Python's text-mode stdin universal-newlines translation doesn't always
fire when the parent opened the pipe in binary mode. Result: glossary
`--definition-file -` stored multi-line definitions with CRLF instead
of LF on Windows, breaking byte-equal round-trip comparisons.
Defensive universal-newlines normalization (.replace('\r\n', '\n')
.replace('\r', '\n')) runs immediately after the stdin read, before
the existing trailing-newline strip. LF is the only line ending the
glossary file format guarantees.
Local: glossary 80/80 still green.
…s CRLF on Windows Final round of Windows fixes: The smoke `json_get` helper does `python -c "...; print($expr)"` to extract values from flowctl JSON output. On Windows, Python's text- mode stdout translates `\n` characters within the printed value to `\r\n` during write. Bash captures the CRLF-fied output, so smokes comparing multi-line content saw `\r\n` even though the on-disk content was clean LF. This was the last source of the glossary Case 4 false positive: the stored multi-line definition is LF (verified by atomic_write `newline=""` + cmd_glossary_add stdin normalization), but the captured value was CRLF. Fix: pipe `json_get` output through `tr -d '\r'` so callers see LF-only content matching what's actually stored. Universal — no-op on Linux/macOS where stdout doesn't translate. Local: glossary 80/80, strategy 62/62, audit 41/41, prospect 94/94 all green after the fix.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds functional smoke coverage to the GitHub Actions matrix beyond the existing
ci_test.sh. Today onlyci_test.sh(57 cases) runs cross-platform; the seven feature smokes that exercise actual user-facing surfaces (~480 cases) only ran locally.This is the gap that caused commits like
4fa652d fix(flowctl): pin subprocess encoding=utf-8 for Windows cp1252 — 0.38.2to ship despite Windows being in the matrix — the matrix only catches whatci_test.shexercises, and that bug surfaced in subprocess output handling that wasn't tested.What's in this PR
Adds 7 smoke suites to each matrix leg, ordered cheap-first so failures surface fast:
ci_test.sh(already in matrix)resolve-pr_smoke_test.shstrategy_smoke_test.shaudit_smoke_test.shglossary_smoke_test.shprospect_smoke_test.shimpl-review_smoke_test.shsmoke_test.shMatrix runs all 3 OSes in parallel — wall time stays ~4 minutes.
Workflow cleanup
fail-fast: false— one OS failure no longer cancels the others (more useful diagnostic signal)defaults.run.shell: bash— unifies the matrix; Git Bash on Windows handles bash scripts the same way ubuntu/macos do, eliminating the per-step Unix/Windows split that existed beforecd "$RUNNER_TEMP"before invocation (set on every GitHub-hosted runner); their refuse-to-run-from-plugin-repo guards stay honoredSmokes deliberately skipped
These need external CLIs not on GitHub-hosted runners — would always fail in CI:
ralph_smoke_test.sh/ralph_smoke_rp.sh— needclaude/codex/rp-cliplan_review_prompt_smoke.sh— needsrp-cli(line 20:command -v rp-cli >/dev/null 2>&1 || fail)Test plan