How we test Cmdr. Decision rules, anti-patterns, and a per-feature checklist. If you're adding tests, read this first.
The companion file docs/tooling/testing.md is the tools inventory (one paragraph per tool).
We prefer broad-shallow unit coverage, narrow-deep integration coverage, and a small number of end-to-end flows. Each layer catches different bugs:
| Layer | Catches | Cost per test | Where |
|---|---|---|---|
| Unit (Rust) | Algorithmic bugs, state-transition bugs | ms | mod tests in the same file |
| Unit (TS) | Component logic, store behavior, pure-fn correctness | ms | *.test.ts next to the source |
| Integration | Cross-module flows that need real fixtures (DB, fs) | seconds | apps/desktop/src-tauri/tests/ |
| IPC contract | Serde-shape drift, command-rename drift, side effects | seconds | apps/desktop/src/lib/ipc/*.test.ts via mockIPC |
| E2E | Cross-component flows (focus, keyboard, dialog stack) | minutes | apps/desktop/test/e2e-playwright/*.spec.ts |
| Tier-3 a11y | Component-level ARIA, labels, focus order | ms | apps/desktop/src/**/*.a11y.test.ts |
Default to the lowest layer that can express the property you want to check. E2E is the most expensive lane; don't push work into it that a unit test would cover.
| You want to test | Tool / layer |
|---|---|
| Pure function with edge cases | proptest (Rust unit). State a property, fuzz inputs. |
| Pure function with a few specific inputs | Plain example tests in mod tests |
| Behavior coverage of an existing tested function | cargo mutants survivor triage: every survived mutant is a behavior-level gap |
| State machine transition | Rust unit test, drive via the public interface, not by setting the atomic directly |
#[tauri::command] boundary |
vitest IPC contract test using installIpcMock() from apps/desktop/src/lib/ipc/test-helpers.ts |
| Frontend component logic | vitest + svelte-testing-library in *.test.ts |
| Component-level a11y (ARIA, labels, focus order) | tier-3 a11y test in *.a11y.test.ts |
| Keyboard shortcut opens a dialog | E2E spec, use dispatchMenuCommand(tauriPage, 'file.copy'). Never synthetic F-key press unless the test exists to verify the keyboard pathway |
| Wait for UI state to change in E2E | expect.poll(async () => …, { timeout }).toBeTruthy() (preferred — wait fuses with assertion); expect(await pollUntil(...)).toBe(true) for the few non-Playwright contexts. Never bare await pollUntil(...) (silent timeout) or await sleep(N) (flaky) |
| Cross-component flow (return-focus, dialog stack, navigation) | E2E (Playwright) |
| Storage volume operation (MTP, SMB) | Integration test against a virtual fixture (virtual-mtp feature, Docker SMB containers) |
These are paid for in lost hours. Don't recreate them.
E2E tests routinely re-find that 80% of wall-clock can be fixed sleeps. Every sleep() is a margin that's either too
tight (flake) or too loose (slow). Always replace with a condition:
// ❌ Don't:
await tauriPage.keyboard.press('F5')
await sleep(2000)
expect(await tauriPage.isVisible('[data-dialog-id="transfer-confirmation"]')).toBe(true)
// ✅ Do:
await tauriPage.keyboard.press('F5')
await tauriPage.waitForSelector('[data-dialog-id="transfer-confirmation"]', 5000)For "wait until X is true" where X isn't a selector, use Playwright's expect.poll:
await expect
.poll(async () => tauriPage.evaluate<number>(`document.querySelector(…)?.offsetHeight ?? 0`), { timeout: 5000 })
.toBeGreaterThan(0)The cmdr/no-arbitrary-sleep-in-e2e ESLint rule flags await sleep(N). Opt out per-line with
// eslint-disable-next-line cmdr/no-arbitrary-sleep-in-e2e -- <reason> only when there's a genuine fixed-duration wait
(e.g., watcher debounce settling), and even then, prefer a poll if any state changes.
The legacy pollUntil helper (and its wrappers pollFs, pollUntilValue, pollActiveMode, pollOverlayGone,
pollFocusedPane) returns false on timeout instead of throwing. A bare expression statement discards the return — if
the condition never holds, the test passes green so long as no later expect happens to catch it. We discovered 187
sites of this pattern across 20 specs; several tests had zero expect() calls and literally could not fail. One
viewer test wasted 5 seconds polling for a toast that never appeared in its window (no ToastContainer mounted there)
and still passed because the next expect was happy.
// ❌ Don't: timeout returns false, no one checks it, test stays green
await pollUntil(tauriPage, async () => fileExistsInFocusedPane(tauriPage, dirName), 2000)
// ✅ Do (preferred — wait fuses with the assertion, fails loudly on timeout):
await expect.poll(async () => fileExistsInFocusedPane(tauriPage, dirName), { timeout: 2000 }).toBeTruthy()
// ✅ Also fine (keeps the helper for non-Playwright contexts):
expect(await pollUntil(tauriPage, async () => fileExistsInFocusedPane(tauriPage, dirName), 2000)).toBe(true)
// ✅ Also fine (when you want to act on the false branch instead of failing):
if (!(await pollUntil(tauriPage, async () => isReady(tauriPage), 3000))) {
throw new Error('listing did not refresh within 3 s')
}Enforced by the bare-poll Go check (fast lane, ~9 ms warm; scans apps/desktop/test/). Opt out for genuine
best-effort cleanups (dismissing an overlay that might or might not be there) with // allowed-bare-poll: <reason> on
the line above or as a trailing comment on the same line. The full design rationale is in
apps/desktop/test/e2e-playwright/CLAUDE.md § "Polling helpers" and scripts/check/CLAUDE.md § bare-poll.
Synthetic KeyboardEvents race against handler attachment under parallel-shard load. If your test asserts on the
dialog that opens, not on the keyboard pathway itself, use dispatchMenuCommand:
// ❌ Don't (unless you're testing the keyboard pathway):
await tauriPage.keyboard.press('F5')
await tauriPage.waitForSelector(TRANSFER_DIALOG, 5000)
// ✅ Do (when the test is about the Copy dialog, not F5):
await dispatchMenuCommand(tauriPage, 'file.copy')
await tauriPage.waitForSelector(TRANSFER_DIALOG, 5000)Keep one or two dedicated tests on the keyboard pathway (app.spec.ts has these, with names like "opens copy dialog
with F5"). The rest should use dispatchMenuCommand.
A state-machine test that does state.intent.store(OperationIntent::RollingBack) is testing nothing: it bypasses the
validation guard the public function performs. Drive through the public interface:
// ❌ Don't:
state.intent.store(OperationIntent::RollingBack as u8, Ordering::SeqCst);
assert!(can_transition_to_stopped(&state));
// ✅ Do:
cancel_write_operation(&app, op_id, CancelMode::Rollback).await?;
let intent = state.intent.load(Ordering::SeqCst);
assert_eq!(intent, OperationIntent::RollingBack as u8);If the public function takes AppHandle that you can't fixture-up cheaply, extract a pure inner helper and test that
through the public-via-helper path. Don't reach past the guard.
Retries hide bugs. If a test flakes, find the race and fix it (Rust IPC race, missing await, watcher debounce, etc.). Drop retries when the cause is gone.
Use commands.commandName(args) from apps/desktop/src/lib/ipc/. Enforced by cmdr/no-raw-tauri-invoke ESLint rule
and the bindings-fresh CI check.
Use typed enum variants, not err.message.includes('not found'). Enforced by cmdr/no-error-string-match (TS) and the
error-string-match check (Rust).
Cargo / Vite / beforeBuildCommand already cache. Wrapping risks shipping stale binaries. See AGENTS.md.
| New thing | Required tests |
|---|---|
New #[tauri::command] |
(a) unit test for the underlying *_core / ops_* helper; (b) IPC contract test in lib/ipc/*.test.ts IF the command is destructive, cross-window, or has > 2 positional args |
| New state or transition in a state machine | At least one unit test driving the new transition via the public interface |
| New pure parser / transform / collation | Consider a proptest (round-trip, idempotence, or "output is valid for the consumer") |
| New keyboard shortcut | Spec it via dispatchMenuCommand if menu-bound; synthetic keydown only if the test exists to verify the keyboard pathway itself |
| New user-visible flow | One E2E happy-path spec; use waitForSelector or expect.poll(...).toBeTruthy() for any state wait (never bare await pollUntil(...)) |
| New write-side operation (copy / move / delete / etc.) | Unit tests for the core + at least one E2E covering cancel and a conflict policy |
| New volume implementation | Integration tests against the virtual fixture for that volume kind |
These modules have invested test infrastructure. New code here must keep that bar:
apps/desktop/src-tauri/src/file_system/write_operations/: state.rs has 30+ tests pinning every state-machine transition. Pattern:cancel_write_operationthrough the public interface, never via direct atomic mutation. See state.rsmod tests.apps/desktop/src-tauri/src/indexing/:IndexPhaselifecycle tests in indexing/mod.rs require a realIndexStore(usetempdir-backed) and a dedicated test mutex (INDEXING is global).apps/desktop/src-tauri/src/file_viewer/:SearchStatustransitions throughsearch_cancelare subtle (the thread writesCancelled, the caller must not nullsession.searchfirst). Seesession.rs::tests.apps/desktop/src-tauri/src/file_system/index/store.rs:platform_case_comparehas proptests for comparator algebra and NFC≡NFD equivalence. Don't regress these.
E2E test hooks split along two axes:
-
Hard hooks (binary shape) live behind Cargo features:
playwright-e2e: feature-gated Tauri commands (inject_listing_error,set_test_throttle,flush_file_watcher) and the tauri-plugin-playwright socket bridge.virtual-mtp: virtual MTP device with deterministic fixtures.smb-e2e: virtual SMB hosts injected into mDNS discovery.
These are compiled out of production binaries entirely. New commands or backends that don't make sense in prod go here.
-
Soft hooks (runtime only) live behind environment variables. They are strictly additive: may add a delay, skip a non-essential step, or emit extra telemetry. Never replace production logic. With the env var unset, the code path is exactly what production runs.
All soft hooks should be wired through
crate::test_modeso the list of test hooks is grep-able from one place. New env-var-driven hooks land there with a helper function. Don't sprinklestd::env::var(...)reads through subsystems.
Existing soft hooks (env vars):
| Variable | Purpose |
|---|---|
CMDR_E2E_MODE=1 |
Canonical "we're under E2E" marker; subsystems can flip behaviors. |
CMDR_E2E_START_PATH |
Fixture directory; surfaced via get_e2e_start_path so FE can pick it up. |
CMDR_E2E_SHARD_KIND |
"mtp" / "non-mtp" / "all": selects spec subset for parallel sharding. |
CMDR_E2E_JSON_REPORT |
Per-shard Playwright JSON report path. |
CMDR_E2E_OUTPUT_DIR |
Per-shard Playwright artifact dir. |
CMDR_E2E_SKIP_VIRTUAL_MTP_SETUP=1 |
Non-MTP shards opt out of wiping the shared MTP backing dir. |
CMDR_E2E_SKIP_MTP_FIXTURES=1 |
Non-MTP shards skip globalSetup's MTP fixture reset. |
CMDR_E2E_COPY_THROTTLE_MS |
Per-file sleep inside the copy loop. Lets tests stage Cancel/Rollback. |
CMDR_PLAYWRIGHT_SOCKET |
Override the plugin's Unix socket path (one socket per shard). |
Existing soft hooks (IPC-driven, feature-gated to playwright-e2e):
| Command | Purpose |
|---|---|
set_test_throttle(ms) |
Mid-run override of CMDR_E2E_COPY_THROTTLE_MS; clears with null. |
flush_file_watcher() |
Synchronously re-reads every active watch, bypassing debouncer + FSEvents latency. |
inject_listing_error() |
Inject an IoError into a volume's next list_directory for retry coverage. |
- After adding a substantial chunk of new code: run
cargo mutants --file <new_file>(Rust) orpnpm exec stryker run(TS) on the file to see if the new tests actually assert anything. Triage survivors. - After E2E suite changes: run
./scripts/check.sh --check desktop-e2e-playwrighttwice back-to-back. The first run warms the cache; the second run catches regressions that only fire under quiet load. Both must be green. - See maintenance.md § Codebase health for the periodic mutation + flake-rate checks.
- Tools inventory: docs/tooling/testing.md
- E2E suite docs: apps/desktop/test/e2e-playwright/CLAUDE.md
- IPC test helpers: apps/desktop/src/lib/ipc/CLAUDE.md
- Notes from the speedup + coverage push: docs/notes/speed-up-e2e-tests.md, docs/notes/extend-e2e-tests.md