Go CLI that runs all code quality checks for the Cmdr monorepo (~41 checks across 4 apps) in parallel with dependency
ordering. Invoked via ./scripts/check.sh at the repo root.
# Run all checks (excludes slow checks by default)
go run ./scripts/check
# Run checks for a specific app
go run ./scripts/check --app desktop
# Run a specific check (accepts ID or nickname)
go run ./scripts/check --check clippy
# Run multiple specific checks
go run ./scripts/check --check rustfmt --check clippy
# Include slow checks
go run ./scripts/check --include-slow
# Run only slow checks
go run ./scripts/check --only-slow
# Run only the curated fast pre-commit lane (~10s)
go run ./scripts/check --fast
# CI mode (no auto-fixing, stop on first failure)
go run ./scripts/check --ci --fail-fast
# Run compat checks on freestyle VM, incompat checks locally, in parallel
go run ./scripts/check --prefer-freestyle
# Run only freestyle-compatible checks on the VM (skip Rust, Docker)
go run ./scripts/check --only-freestyle| Option | Description |
|---|---|
--app NAME |
Run checks for a specific app |
--rust, --rust-only |
Run only Rust checks (desktop) |
--svelte, --svelte-only |
Run only Svelte checks (desktop) |
--check ID |
Run specific checks by ID or nickname (repeatable) |
--ci |
Disable auto-fixing (for CI) |
--verbose |
Show detailed output |
--include-slow |
Include slow checks (excluded by default) |
--only-slow |
Run only slow checks |
--fast |
Run only the curated fast pre-commit check set |
--only-freestyle |
Run freestyle-compatible checks on a VM (skip the rest) |
--prefer-freestyle |
Run compat checks on VM + the rest locally in parallel |
--fail-fast |
Stop on first failure |
--no-log |
Disable CSV stats logging |
-h, --help |
Show help message |
./scripts/check.sh [flags]
-> go run ./scripts/check [flags]
-> ValidateCheckNames() # startup: catch ID/nickname collisions
-> parseFlags()
-> findRootDir() # walk up to repo root
-> handleFreestyleFlags():
--prefer-freestyle: # parallel: VM (compat) + local (incompat)
goroutine: freestyleRun() # push sync branch, run on VM
local: Runner.Run() # FreestyleIncompat checks only
wait + reconcile results
--only-freestyle: # VM only, skip incompat
freestyleRun()
-> selectChecks() # filter AllChecks by flags
-> FilterSlowChecks() # drop IsSlow=true unless --include-slow or --check used
-> FilterCIOnlyChecks() # drop CIOnly=true unless --ci or --check named it
-> FilterFastChecks() # if --fast: keep IsFast=true (or named via --check)
-> ensurePnpmDependencies() # pnpm install once at root (skipped for Rust-only runs)
-> Runner.Run():
goroutine pool (NumCPU semaphore)
for each pending check: canStart() checks DependsOn deps
-> dep pending/running: wait
-> dep failed/blocked: mark StatusBlocked, print BLOCKED
-> all deps done: launch goroutine -> runCheck() -> completedCh
status line goroutine (200ms tick, TTY only): "Waiting for: foo, bar..."
-> print summary, exit 0/1
| File | Purpose |
|---|---|
main.go |
Entry point: flag parsing, root dir discovery, check selection, pnpm gating, runner delegation |
runner.go |
Parallel executor: goroutine pool, dependency graph, fail-fast, live TTY status line |
checks/common.go |
Core types (CheckDefinition, CheckResult, CheckContext, CheckFunc), shared utils (RunCommand, EnsureGoTool, runPrettierCheck, runESLintCheck) |
checks/registry.go |
AllChecks: canonical ordered list of all check definitions. Lookup and validation functions. |
checks/registry_test.go |
Collision detection, CLIName() tests |
stats.go |
CSV stats logging (logCheckStats): appends one row per check to ~/cmdr-check-log.csv |
colors.go |
ANSI color constants |
utils.go |
findRootDir() (walks up until apps/desktop/src-tauri/Cargo.toml is found) |
checks/desktop-rust-*.go |
One file per Rust check |
checks/desktop-svelte-*.go |
One file per Svelte/TS check |
checks/website-*.go, checks/api-server-*.go, checks/scripts-go-*.go |
One file per check |
checks/file-length.go |
Informational file-length scanner (warn-only, never fails). Supports an allowlist. |
checks/file-length-allowlist.json |
Allowlist for file-length check: { "files": { "path": lineCount } }. Files at or below their allowlisted count are suppressed. |
checks/changelog-commit-links.go |
Validates every https://github.com/vdavid/cmdr/commit/<sha> URL in CHANGELOG.md resolves, via a single git cat-file --batch-check process. |
IDs vs nicknames: --check accepts either. CLIName() returns nickname if set, else ID. ValidateCheckNames()
runs at startup and fatals on any collision.
Dependency graph: Flat DependsOn slice per check. Blocked checks get StatusBlocked on dep failure and are
counted as failed. Dependencies not in the selected run set are treated as satisfied.
Auto-fix vs CI mode: --ci disables auto-fixing. Formatters/linters fix files locally, report only in CI.
runPrettierCheck and runESLintCheck in common.go handle both modes.
Slow checks: IsSlow: true marks checks excluded by default (currently: rust-tests-linux, desktop-e2e-linux,
desktop-e2e-playwright). Named --check invocations implicitly include slow checks
(includeSlow = len(checkNames) > 0).
Fast lane (--fast): IsFast: true marks the curated pre-commit check set: ~28 checks that finish in roughly 10s
on a warm cache, intended to run before every commit. It's an editorial pick, not a timing-derived list (see Key
decisions below). Named --check invocations bypass the filter so --fast --check svelte-check still runs
svelte-check. Mutually exclusive with --include-slow / --only-slow — combining them errors out, since the lanes are
intentionally separate.
CI-only checks: CIOnly: true marks checks that run only in --ci mode (currently: cargo-udeps). They're
silently dropped from local runs (no SKIPPED line) and are not pulled in by --include-slow or --only-slow. Escape
hatch: an explicit --check cargo-udeps always runs, so you can verify locally before pushing.
Self-contained E2E checks: desktop-e2e-playwright manages the full lifecycle (build binary once, create per-shard
fixtures, start N Tauri instances, run N Playwright processes in parallel, cleanup). Each shard runs in its own isolated
CMDR_DATA_DIR with its own Unix socket and MCP port (9429 + shard offset), plus a per-shard CMDR_INSTANCE_ID of the
form e2e-<short>-<pid> (for example, e2e-mtp-12345, e2e-nonmtp1-12345). The instance ID drives the macOS Keychain
SERVICE_NAME suffix (Cmdr-e2e-<short>-<pid>) so two parallel shards can never collide on credentials, and reshapes
the Dock label into Cmdr (E2E <short>) so cleanup scripts can target with pgrep -f 'Cmdr (E2E '. One shard is
dedicated to MTP specs (serialized; the virtual MTP backing dir at /tmp/cmdr-mtp-e2e-fixtures is shared by every Tauri
instance). Stale processes on each port are killed before starting. Per-shard logs go to
/tmp/cmdr-e2e-playwright-<shard>-<timestamp>.log. See docs/tooling/instance-isolation.md § "How E2E gets isolated
per shard".
RUST_LOG is forwarded to the app (via inherited os.Environ()), so trace-level output is one shell-prefix away:
RUST_LOG=cmdr_lib::file_system::volume::mtp=trace ./scripts/check.sh --check desktop-e2e-playwrightThe chosen RUST_LOG value is echoed at the top of the timestamped log so it's obvious from a glance which level was
captured. When unset, the log starts with === RUST_LOG unset (default warn level) ===.
Go tool auto-install: EnsureGoTool(name, installPath) checks PATH first, then runs go install and returns the
full binary path. Used for staticcheck, nilaway, etc. The installPath MUST pin a specific version (@vX.Y.Z or a
pseudo-version), never @latest: an @latest install is the Go-side equivalent of the wave-1-2 npm-registry-trojan
vector. Same rule applies to cargo install calls inside checks: pin both --version and --locked.
TTY detection: golang.org/x/term.IsTerminal gates the live status line; CI logs stay clean.
CSV stats logging: Each check run appends a row to ~/cmdr-check-log.csv with timestamp, app, check name, duration,
result (pass/fail/skip/blocked), and optional counts (total, issues, changes). CheckResult has Total, Issues,
Changes fields (-1 = N/A, rendered as N/A in CSV). Disabled by --no-log or --ci. Implementation in stats.go.
File-length allowlist: checks/file-length-allowlist.json maps relative paths to accepted line counts. Files at or
below their allowlisted count are silently suppressed. Files that grow beyond their allowlisted count are reported with
both the current and allowed line counts. New files not in the allowlist are reported normally. When the allowlist
suppresses all long files, the check shows "No new long files (N allowlisted)". If the allowlist file is missing, all
long files are reported (backwards-compatible). To allowlist a file, add "relative/path": lineCount to the files
object.
CheckDefinition{
ID: "desktop-svelte-eslint", // unique, always accepted by --check
Nickname: "", // short alias, also accepted by --check (optional)
DisplayName: "eslint", // shown in output
App: AppDesktop,
Tech: "🎨 Svelte",
IsSlow: false,
IsFast: false, // true = included in --fast (curated pre-commit lane)
CIOnly: false, // true = run only in --ci mode (or explicit --check)
FreestyleIncompat: true, // can NOT run on freestyle.sh VMs (Rust, Docker)
DependsOn: []string{"desktop-svelte-prettier"},
Run: RunDesktopESLint,
}- Create
checks/{app}-{name}.gowith afunc RunSomething(ctx *CheckContext) (CheckResult, error). Usewebsite-build.goorwebsite-docker.goas templates; they're the simplest. - Register it in
AllChecksinregistry.go(ID, App, Tech, DependsOn, Run). - Return
Success("message")on pass,fmt.Errorf(...)on fail,Skipped("reason")to skip. - Add a test file if the check has non-trivial logic (
checks/{app}-{name}_test.go). - Run
./scripts/check.sh --goto verify (staticcheck is strict about idiomatic Go). - Update the table below and
AGENTS.md's--checklist.
- Return
Success(message)on success with a short, informative message - Return
Warning(message)for non-fatal issues - Return
Skipped(reason)when the check can't run (for example, missing config) - Return
CheckResult{}, erroron failure
Include useful stats: "12 tests passed", "Checked 42 files", "No lint errors". Avoid generic "OK".
Include the command output using indentOutput():
return CheckResult{}, fmt.Errorf("check failed\n%s", indentOutput(output))Set DependsOn to ensure checks run in the right order: formatters before linters, linters before tests, type checkers
before tests.
| App | Tech | Checks |
|---|---|---|
| Desktop | Rust | rustfmt, clippy, cargo-audit, cargo-deny, cargo-machete, cargo-udeps (CI-only), jscpd, log-error-macro, error-string-match, bindings-fresh, ipc-enum-camelcase, tests, integration-tests (Docker SMB), tests-linux (slow) |
| Desktop | Svelte | prettier, eslint, eslint-typecheck (slow), stylelint, css-unused, a11y-contrast, btn-restyle, bare-poll, svelte-check, import-cycles, knip, type-drift, tests, e2e-linux-typecheck, e2e-linux (slow), e2e-playwright (slow) |
| Website | Astro | prettier, eslint, typecheck, build, html-validate, e2e |
| Website | Docker | docker-build |
| API server | TS | oxfmt, eslint, typecheck, tests |
| Scripts | Go | gofmt, go-vet, staticcheck, ineffassign, misspell, gocyclo, nilaway, deadcode, go-tests, govulncheck |
| Other | Metrics | file-length (warn-only), CLAUDE.md-reminder (warn-only), changelog-commit-links, workflows-rustup (forbids rustup target/component add in workflows) |
| Other | Security | workflows-hardening (SHA-pinning, no pull_request_target, job-scoped id-token: write) |
Each check outputs a single line:
Desktop: Rust / clippy... OK (1.23s) - No warnings
Status can be: OK (green), warn (yellow), SKIPPED (yellow), FAILED (red), BLOCKED (yellow).
A check shows "BLOCKED" when its dependency failed. Fix the dependency first.
Use CommandExists() to check if a tool is installed, and auto-install if possible via EnsureGoTool.
Decision: Go instead of Bash for the check script. Why: Cross-platform support (especially Windows), type-safe, better error handling, and ability to build complex logic (parallel checks, dependency graph, colored output). Go is already in the toolchain via mise.
Decision: cargo-nextest instead of cargo test. Why: Faster test execution (parallel by default), better
output formatting, clearer failure messages. Auto-installed by the check script if missing.
Decision: cargo-deny advisories check disabled; use cargo-audit instead. Why: Tauri's transitive
dependencies (gtk3-rs, unic-*, fxhash, proc-macro-error, etc.) trigger unmaintained-crate advisories we can't control.
cargo-audit still catches critical security vulnerabilities. License, bans, and sources checks in cargo-deny remain
active. See comment in src-tauri/deny.toml.
Decision: every operational cargo command in checks passes --locked. Why: without it, cargo silently
re-resolves Cargo.lock whenever upstream metadata shifts (a yank, a new transitive dep version). For a 1028-crate
lockfile, that resolution window is wide and lets a freshly-published malicious version land mid-build. --locked fails
loudly if the lockfile would change. Applies to cargo clippy, cargo nextest run (in both desktop-rust-tests and
desktop-rust-integration-tests), and cargo +nightly udeps. Audit/deny/machete read Cargo.lock without updating it,
so --locked is moot for them, but the install of those tools still uses --locked to lock the tool's own dep tree.
Decision: every tool install pins --version and --locked (cargo) or @vX.Y.Z (Go). Why: an unpinned tool
install (cargo install cargo-audit or EnsureGoTool(..., "@latest")) means each fresh checkout pulls whatever's
latest. A wave-1-2-class compromise of any of these tool repositories would auto-propagate. Pinning is the Go-side
equivalent of the pnpm minimum-release-age defense (a fresh version can't land without a deliberate bump).
Decision: workflows-hardening check enforces three GitHub Actions invariants and acts as a regression guard.
Why: cmdr's workflows are already correctly hardened (every third-party action is SHA-pinned with a comment, no
pull_request_target triggers, no workflow-scoped id-token: write). Without an automated guard, a future PR or a
Renovate misconfiguration could silently regress any of those without anyone noticing in review. The check fails on
tag/branch-pinned third-party actions, on pull_request_target triggers (wave-4's entry vector), and on workflow-scoped
id-token: write (must be job-scoped per the wave-4 OIDC-token-extraction lesson). Local actions (./...) are exempt.
Decision: govulncheck runs against every Go module. Why: cargo-audit covers Rust deps; nothing covered Go
until now. govulncheck is static-analysis-based, so it only flags vulns actually reachable from the code (low false
positive rate). Most of cmdr's Go modules are dep-free tooling scripts but still call into the Go stdlib, which gets its
own CVEs; the check found 7 real reachable stdlib vulns the first time it ran (fixed by bumping mise's Go pin). Mirrors
the cargo-audit role on the Rust side.
Decision: cfg-gate check to catch ungated macOS-only crate imports. Why: Rust code using macOS-only crates
(from [target.'cfg(target_os = "macos")'.dependencies]) compiles fine on macOS but fails on Linux if the use isn't
wrapped in #[cfg(target_os = "macos")]. CI catches this after push, but the check catches it locally and instantly. It
parses Cargo.toml for macOS-only crate names, detects module-level gating (for example,
#[cfg(target_os = "macos")] mod foo; in lib.rs makes everything inside foo inherently safe), and scans remaining
files for ungated use statements.
Decision: bare-poll check to catch silently-passing E2E tests. Why: Cmdr's pollUntil helper (and its
wrappers pollFs, pollUntilValue, pollActiveMode, pollOverlayGone, pollFocusedPane) returns false on timeout
instead of throwing. A bare await pollUntil(...) statement therefore reduces to "wait up to N seconds, then quietly
proceed" — if the polled condition never holds, the test passes green so long as no later expect happens to catch it.
Discovered when investigating why one viewer test took 5+ seconds: the toast it polled for never appeared (the viewer
window doesn't mount ToastContainer), but the test still passed because the next expect (clipboard contents) was
satisfied by the underlying copy. A repo-wide grep turned up 187 bare-poll sites across 20 specs. Several spec files
contained tests with zero expect() calls whose entire assertion was a bare await pollUntil(...) — those tests
literally couldn't fail. The check is a fast-lane Go scanner (apps/desktop/test/, ~9 ms warm) modeled on
error-string-match. Same line-anchored grep pattern: ^\s*await\s+(pollUntil|pollFs|…)\s*\( only matches the
bare-expression-statement shape, so expect(await pollUntil(…)).toBe(true) / if (!(await pollUntil(…))) /
return await pollUntil(…) / const ok = await pollUntil(…) all pass through. Opt out for genuine best-effort cleanups
(dismissing an overlay that might or might not be there) with // allowed-bare-poll: <reason> on the line above or as a
trailing comment. The preferred migration target is Playwright's expect.poll(() => …).toBeTruthy(), which fuses the
wait with the assertion so the bug class is structurally impossible.
Decision: Auto-fix locally, check-only in CI. Why: Developers get instant fixes locally (less friction), CI
ensures code is properly formatted before merge. Controlled by the --ci flag.
Decision: Split desktop-svelte-eslint into fast (non-type-aware) and slow (full) checks. Why: Type-aware rules
(no-floating-promises, no-unsafe-*, etc.) take ~45% of lint time due to TypeScript project service startup. The fast
check sets ESLINT_NO_TYPECHECK=1, which eslint.config.js reads to use tseslint.configs.strict (no type info) and
suppress reportUnusedDisableDirectives (since disable comments for type-aware rules would look unused). The slow check
(IsSlow: true) runs the full config with all rules and reportUnusedDisableDirectives on, so stale disable comments
are still caught.
Decision: Clippy runs the enforcing pass (-D warnings) first, and only invokes cargo clippy --fix if that fails
(and we're not in CI). Why: Running --fix speculatively before every check doubled wall time on the happy path (no
warnings = no fix to apply). The enforcing pass is the one that actually decides pass/fail anyway, and --fix ignores
-D warnings so it can't be combined into a single invocation. The trade is one extra re-check on the rare warning
path; cuts ~50% off clippy in the common clean case.
Decision: bindings-fresh is hash-cached. Why: The check used to run pnpm bindings:regen on every invocation
(~2 min for a test-mode compile of the full crate) just to confirm the output didn't change. We now hash every .rs
file under src-tauri/src plus Cargo.lock and Cargo.toml, plus the current bindings.ts, and store both hashes in
<CARGO_TARGET_DIR or workspace target>/.bindings-fresh-marker after each successful run. If both match next time, we
return OK in <100 ms. The bindings.ts hash in the marker protects against manual edits; if someone hand-tweaks the file
the marker no longer matches and we run the full regen. The marker lives inside cargo's actual target dir (honoring
CARGO_TARGET_DIR), so cargo clean and wholesale target/ deletion auto-invalidate it. Same shared-fate pattern as
node_modules/.pnpm-install-marker. Hashing all .rs files (rather than only those with #[tauri::command] /
specta::Type) costs ~tens of ms here and removes any "added the attr to a new file but the watch list didn't pick it
up" footgun.
Decision: bindings-fresh auto-regens outside --ci. Why: When the hash check detects drift, the check runs
pnpm bindings:regen and, in non-CI mode, keeps the regenerated bindings.ts (returns SuccessWithChanges). Same
philosophy as oxfmt, gofmt, and clippy --fix: locally fix what's mechanical so the dev reviews and commits the
diff alongside the Rust change that caused it, instead of being told to run a separate command and re-run the checker.
In --ci mode the regenerated output is byte-compared against the original (which is then restored, so CI never
modifies the tree) and the check fails on drift, with a "Run pnpm bindings:regen from apps/desktop/" hint. The
marker is updated either way so the next run short-circuits.
Decision: cargo-machete runs locally; cargo-udeps is CI-only. Why: Both detect unused dependencies but trade
off speed against precision. udeps compiles the whole crate with nightly (~2 min cold) and is authoritative. machete
greps source files for use foo; patterns (~0.5 s on this codebase, no compile) and catches the common case (removed
the last use but forgot to drop the dep) plus a class udeps misses ("transitively-used" deps where your Cargo.toml
lists serde but only a transitive dep actually uses it). machete's blind spot is deps used only inside macro expansions
or build.rs codegen; opt those out via [package.metadata.cargo-machete] ignored = ["foo"] in the relevant Cargo.toml.
Local dev gets instant feedback from machete; CI runs udeps for the long-tail check.
Decision: CIOnly field on CheckDefinition (mirrors IsSlow and FreestyleIncompat). Why: Keeps "this check
runs only in CI" colocated with the check definition rather than as a hardcoded list elsewhere. FilterCIOnlyChecks in
registry.go drops them outside --ci, with a --check <name> escape hatch so devs can verify locally before pushing.
Orthogonal to IsSlow: --include-slow and --only-slow do NOT pull in CI-only checks (you'd otherwise lose the
ability to run "all slow checks locally without the CI-only ones"). Negative-sense default (false = runs locally)
matches the other gating fields.
Decision: IsFast field on CheckDefinition and a curated --fast pre-commit lane. Why: A pre-commit run
should finish in ~10s so it actually gets used. The list is editorially curated, not derived from CSV timings: warm
average is what matters, but cold-cache outliers (cargo-audit spiking to ~3 min on advisory DB refresh) would silently
make the lane unreliable on the first run of the day. Mirrors the IsSlow / CIOnly field pattern (negative-sense
boolean default, same colocated style). Mutually exclusive with --include-slow / --only-slow to keep the semantics
unambiguous: "give me the fast lane" and "give me the slow lane" can't both be true. Named --check invocations bypass
the filter (same escape hatch as IsSlow and CIOnly).
Decision: Skip pnpm install when lockfile is unchanged. Why: pnpm install takes ~20s and pegs all CPUs even
when deps haven't changed. A marker file (node_modules/.pnpm-install-marker) stores pnpm-lock.yaml's mtime after
each successful install. On the next run, if the mtime matches, install is skipped. The marker lives inside
node_modules/ so it's automatically invalidated if node_modules is deleted. Always runs in CI (--ci).
Decision: E2E failure output uses section-aware filtering, not a pattern denylist. Why: The checker's contract
with agents is that output is concise enough to read in full: no head/tail/grep needed. Raw Playwright + Tauri +
Docker output is 1000+ lines on a failure (test pass markers, app stdout log, post-ELIFECYCLE build dump). The captured
output has four stable sections (setup, per-test progress, numbered failure blocks, post-ELIFECYCLE dump), split by
fixed delimiters (Starting Tauri app..., \d+\) \[tauri\], [ELIFECYCLE]). extractE2ETestOutput in
desktop-svelte-e2e-playwright.go keeps the failure blocks verbatim, drops the post-ELIFECYCLE dump, and in the
progress section keeps ✘ markers with their preceding annotation lines (like [SMB diag] MCP port: …) while dropping
✓/- markers with theirs. The untouched output stays in the timestamped log file the error message links to. Both
desktop-e2e-linux and desktop-e2e-playwright call the same helper.
If the run died before reaching the test phase, none of Starting Tauri app..., a \d+) [tauri] failure-block header,
or a \d+ (passed|failed|flaky|skipped) tally line will be present. isPreTestFailure checks all three; only if all
three are absent does the filter prepend note: tests did not reach the run phase and drop the verbose
docker compose ps table (anchored on its NAME IMAGE COMMAND header so prose containing Up <N> survives). Checking
the tally and failure block (not just the Tauri marker) avoids false positives on macOS playwright shards, where Tauri
is started by the Go check and its stdout goes to a per-shard log file, so the marker never appears in Playwright stdout
regardless of success.
Decision: cargo test / cargo nextest failure output is filtered by dropping pass/skip verdict lines only.
Why: A 1786-test run produces ~1800 noise lines around 2 real failures. The harness format is stable enough that a
single per-line regex (^test … ... (ok|ignored…)$ for cargo test, ^\s+(PASS|SKIP) [...] …$ for cargo nextest)
can drop the noise without risking false positives on panic-message bodies (start-of-line anchor protects quoted test
phrases). trimRustTestProgress in desktop-rust-tests-linux.go runs after trimBuildNoise. Everything else
(running N tests header, FAIL/FAILED/LEAK/TIMEOUT verdicts, the failures: block, the test result: / Summary
tally, error: lines, bench results) passes through unchanged.
Decision: SMB Docker container lifecycle is owned by a runner-level orchestrator, not per-check. Why: Multiple
checks (desktop-rust-integration-tests, desktop-e2e-linux) need the shared smb-consumer Docker Compose project.
Before, each owned the lifecycle: start in entry, defer ./stop.sh in cleanup. When both ran in parallel under
--include-slow, whichever finished first would tear down containers the other was still using, producing
Cannot reach smb-consumer-X cascades. SmbOrchestrator (scripts/check/smb_orchestrator.go) lifts lifecycle one
level up: at runner init, after selectChecks() resolves the planned set, the orchestrator brings up the union of
NeedsSmb modes (SmbModeCore for integration tests, SmbModeE2E for e2e). At runner exit (normal, --fail-fast, or
SIGINT) it tears down once. Checks marked NeedsSmb no longer manage their own lifecycle: they assume the containers
are up and call waitForSmbContainers as a cheap mid-run zombie-guard. The smaller scripts (start.sh,
e2e-linux.sh::start_smb_containers) keep working standalone for pnpm test:e2e:linux invocations outside the check
runner; under check.sh their start.sh invocation just sees the orchestrator's containers already running and probes are
idempotent. The SIGINT handler in main.go captures the orchestrator via shared variable so a Ctrl+C also triggers
./stop.sh with a banner before exiting 130.
Decision: silence apt/dpkg at the source, not via a post-hoc denylist. Why: provisionScript redirects both apt
commands to a log file under DEBIAN_FRONTEND=noninteractive + -qq, so on a successful provision the check's stdout
gets zero apt lines. The log lives on a per-run host directory (/tmp/cmdr-rust-tests-linux-<unix-ts>/provision.log)
bind-mounted into the container at /cmdr-logs, so it survives the container's --rm and is discoverable from the
check's Success/failure message. On apt failure, the script dumps the full log to stderr (captured by Go) so the user
sees what went wrong without having to fish for the file. A previous attempt to scrub the verbose output with a
packageManagerNoiseRE denylist was a treadmill: every Debian version adds new dpkg verbs (Setting up, Unpacking,
Processing triggers, Get:N, Hit:N, Selecting previously unselected package, Created symlink,
update-alternatives:, procps:, etc.), continuation lines from multi-line apt prompts have no stable shape, and
apt-get -qq alone doesn't propagate to dpkg's per-package chatter. Redirection at source is bulletproof and
zero-maint. trimBuildNoise now only cuts everything before the last Compiling … line; when no such line exists
(provisioning died before cargo ran), the output is returned verbatim. Length-based truncation is forbidden everywhere;
if 200 tests fail, all 200 panic bodies pass through.
Decision: nextest binary is arch-aware. Why: https://get.nexte.st/latest/linux serves the x86_64-musl build by
default; on an arm64 container (e.g. Apple Silicon under OrbStack) cargo's rustup-shim happily syncs the aarch64
toolchain, then execs the x86 nextest binary and OrbStack crashes with
Dynamic loader not found: /lib64/ld-linux-x86-64.so.2. The fix is dpkg --print-architecture → linux for amd64 and
linux-arm for arm64, matching the Go tarball selection.
Two modes for offloading checks to a freestyle.sh VM:
--only-freestyle: runs only freestyle-compatible checks on the VM, skips the rest entirely.--prefer-freestyle: runs freestyle-compatible checks on the VM and the rest locally, in parallel. This is the "run everything as fast as possible" mode: Rust checks run on your Mac while Node/Go checks run on the VM simultaneously.
How it works: Creates a temporary git commit of the full working tree (without modifying the local index/worktree), pushes it to a temp branch, fetches on the VM, runs checks, cleans up the branch.
What's freestyle-compatible: Node/TS checks (Svelte, Astro, API server), Go checks, and metrics; any check without
FreestyleIncompat: true. The VM uses --freestyle-remote internally to filter to only these checks.
What's not: Rust checks (dep compilation exceeds freestyle's ~15 min API timeout) and Docker checks (no Docker
daemon on freestyle VMs). With --prefer-freestyle these run locally in parallel; with --only-freestyle they're
skipped.
VM lifecycle: The VM is created once (toolchain setup), then uses persistent storage so it survives freestyle's
resource management. It auto-suspends after 5 min idle but resumes in <1s. VM ID is stored in .freestyle-vm-id
(gitignored). On wake, a health check verifies the toolchain; if it fails, the VM is replaced. Setup parallelizes pnpm +
Playwright install and uses a shallow clone.
Key files: freestyle.go (all freestyle logic including preferFreestyleRun), main.go (handleFreestyleFlags
dispatches to the right mode).
Decision: FreestyleIncompat field on CheckDefinition instead of hardcoded check lists. Why: Keeps freestyle
compatibility co-located with each check's definition. Easy to flip when freestyle constraints change. Negative-sense
boolean means the Go zero value (false) = compatible, so only the few incompatible checks (Rust, Docker) need to opt
out.
Decision: Skip Rust checks entirely on freestyle (not just slow ones). Why: Freestyle's free tier has a hard ~15
min server-side timeout on exec-await. Compiling the full Tauri dependency tree (clippy, cargo-udeps, etc.) on 4 x86
vCPUs exceeds this. The 8 GB RAM also causes swap pressure when Rust and Node run in parallel. Attempted workarounds
(2-VM split, nohup background builds) all failed due to VM lifecycle issues (auto-suspend kills background processes,
stopped VMs lose disk state).
Decision: mise's standalone pnpm disabled on freestyle VMs. Why: The pnpm binary mise installs ships a baked-in
V8 snapshot that crashes on freestyle's x86 Linux VMs. We install pnpm via npm install -g pnpm@10 instead, configured
via [settings] disable_tools = ["pnpm"] in /root/.config/mise/config.toml.
--only-slow needs ~20 min timeout. Slow checks (E2E tests, eslint-typecheck) take significantly longer than the
default checks. When running --only-slow via an agent or CI, set the timeout to at least 20 minutes (1,200,000 ms).
Never run two ./scripts/check.sh invocations concurrently if either touches SMB. The SmbOrchestrator is scoped
to one runner process: it starts the smb-consumer Docker Compose project at runner init and tears it down at runner
exit. Two parallel invocations get two orchestrators racing the same containers. The first to finish runs ./stop.sh
while the other is still mid-test, producing Cannot reach smb-consumer-X cascades. Symptom: a previously green check
(typically desktop-e2e-linux or desktop-rust-integration-tests) starts failing several SMB tests with 30 s timeouts
in the second-to-finish run.
The right way to run two SMB-touching checks together is one invocation with multiple --check flags so the same
orchestrator owns the containers, or sequentially. For example:
# Good: one orchestrator, shared SMB stack
./scripts/check.sh --check desktop-e2e-linux --check desktop-e2e-playwright
# Also fine: sequential
./scripts/check.sh --check desktop-e2e-linux
./scripts/check.sh --check desktop-e2e-playwright
# Wrong: two orchestrators racing
./scripts/check.sh --check desktop-e2e-linux &
./scripts/check.sh --check desktop-e2e-playwright &Same applies to running a check.sh invocation alongside a raw pnpm test:e2e:linux or
apps/desktop/test/smb-servers/start.sh in another terminal — only one process should own the SMB stack at a time. The
e2e-linux.sh and start.sh scripts are safe to run standalone when no check.sh is also running, but they don't
coordinate with each other across processes.
golang.org/x/term, golang.org/x/sys (transitive). Go 1.25.