test(ublk): porcupine linearizability on concurrent reads/writes by ValentaTomas · Pull Request #28 · e2b-dev/ublk-go

ValentaTomas · 2026-05-02T00:19:16Z

Stacked on: #27 (tests/rapid-state-machine).

Adds TestRapidLinearizability (in ublk/porcupine_integration_test.go) — a porcupine-driven linearizability checker for concurrent reads and writes against a single ublk device.

Why this catches things rapid alone doesn't

TestRapidStateMachine (PR #27) checks a per-operation invariant: after every action, the device's bytes match the model's shadow. That's sufficient when the test driver issues actions sequentially — which it does, because rapid state machines are sequential by construction.

This PR addresses a different question: can the global real-time history of concurrent ops be explained by some valid sequential ordering? That's linearizability, the same property Jepsen checks for distributed databases. A history can pass the per-operation invariant on every read taken in isolation and still be non-linearizable — e.g. if two concurrent writes' effects are observed in inconsistent orders by later reads.

Implementation choice — Option B

Both options outlined in the spec were on the table:

Option A would have instrumented TestRapidStateMachine to record an operation history. The problem: rapid drives the actions strictly sequentially, so the history is trivially linearizable and the porcupine check is pure overhead.
Option B (chosen): a standalone test in its own file driving a concurrent worker pool. The rapid state machine is preserved untouched as the per-operation invariant checker; this test is the global-ordering checker. Cleaner separation of concerns and avoids forcing PR test(ublk): rapid property-based state-machine tests #27's test into a shape it doesn't want.

The model

One register per 4 KiB block (map[int]uint64 — block index → most-recent stamp). Each write embeds a unique 8-byte stamp (from a global atomic counter starting at 1) at the start of its 4 KiB block; reads recover the stamp from the bytes returned. Reads/writes are constrained to a single block at a time so each op is atomic from the model's perspective. Stamp 0 is reserved for "never written", which matches the all-zero bytes the device returns for unwritten blocks.

The workload

Single 256 KiB device (64 blocks of 4 KiB).
Default 4 concurrent goroutines × 50 ops each (200 ops total).
Each op:
- Call = time.Now() recorded immediately before the syscall.
- Issues unix.Pread/unix.Pwrite against /dev/ublkbN with O_DIRECT and alignedBuf.
- Return = time.Now() recorded immediately after.
- Appended to a mutex-protected []porcupine.Operation.
After the workload phase: porcupine.CheckOperationsVerbose(model, history, 30s). Illegal histories are rendered to a HTML visualization via porcupine.VisualizePath and the test fails with the path logged. Unknown (timeout) is logged as a soft pass with guidance to either shrink the history or grow the budget.

Tunables: UBLK_LINZ_OPS (default 200) and UBLK_LINZ_WORKERS (default 4).

Tooling

make test-linz runs only this test against an integration-tagged binary.
No new CI job needed: the existing test-integration job already runs every //go:build integration test in ./ublk/. Verified in .github/workflows/ci.yml.
TODO.md "Linearizability checking" bullet replaced with a (done) summary.
Pinned dependency: github.com/anishathalye/porcupine v1.1.0.

fd-close-before-Close discipline

Per AGENTS.md: the user fd opened on /dev/ublkbN is closed before dev.Close() (in a t.Cleanup), otherwise del_gendisk blocks waiting for the open ref to drop. Documented inline.

Test plan

go vet ./...
golangci-lint run ./... → 0 issues.
go test -count=1 -race ./ublk/uring/ ./ublk/
go test -c -tags=integration -o /tmp/ublk.test ./ublk/ compiles
gofmt -l . empty
go mod tidy -diff clean
CI's test-integration job exercises TestRapidLinearizability end-to-end on a host with ublk_drv + root.
(Optional) Manual sanity check: make test-linz on a kernel host; should pass and log linearizable: 200 ops checked in <Xs>.

Forbidden checks

No library code touched (ublk.go, worker.go, device.go, types.go, ublk/uring/* untouched).
PR ci: bump actions/checkout from 4 to 6 #1 (fuzz_test.go) and PR ci: bump codecov/codecov-action from 4 to 5 #2 (chaos_integration_test.go) files not modified.
No t.Skip for missing root or kernel.
TestRapidStateMachine and the rest of PR test(ublk): rapid property-based state-machine tests #27 untouched (additive only).

…ycle and isolation Adds pgregory.net/rapid v1.3.0 as a test dependency and a new TestRapidStateMachine in ublk/rapid_integration_test.go. The state machine drives random sequences of create/write/read/fsync/ close actions against up to two live ublk devices and an in-process shadow model. Invariants checked after every action: 1. Read returns bytes from the most recent Write (per device). 2. Bytes written to device A never appear at the same offset on device B (cross-device isolation). 3. Close terminates within a 5 s timer (a hang in del_gendisk would otherwise deadlock the test rather than report a failure). 4. Close is idempotent — a second call must not panic or hang. User fds on /dev/ublkbN are closed before Device.Close (AGENTS.md fd-close-before-Close discipline) so del_gendisk does not block. Adds a make target test-rapid for filtered local iteration. The new test runs as part of the existing test-integration CI job; no new CI job is needed. TODO.md's "Property-based / model-based state machine tests (rapid)" item is marked done with a pointer to the new file.

cursor · 2026-05-02T00:19:21Z

PR Summary

Low Risk
Low risk because changes are limited to tests/tooling and a new test-only dependency, with no production code changes. Main risk is longer or flaky integration runs due to timing- and concurrency-sensitive checking.

Overview
Adds an integration-tagged TestPorcupineLinearizability that drives concurrent pread/pwrite workloads against a ublk device, records a real-time history, and checks it with github.com/anishathalye/porcupine (with HTML visualization on failure and a soft-pass on checker timeout).

Updates go.mod/go.sum to include Porcupine, adds make test-linz to run only this test, and removes the corresponding now-completed linearizability TODO entry.

^{Reviewed by Cursor Bugbot for commit 94bf7da. Bugbot is set up for automated code reviews on this repo. Configure here.}

The rapidMaxCreates cap was unconditional, so once a Run hit it AND the last live device was closed, every subsequent action skipped (create for the cap, read/write/fsync/close for 'no live devices'). rapid then fails the Run with 'can't find a valid (non-skipped) action'. Reproduced on CI: --- FAIL: TestRapidStateMachine after ~30s of churn that closed all devices and tried to grow past the cap. Lift the cap when len(live) == 0 so create is always available as a recovery path. The cap still bounds runtime in the common case where at least one device is alive.

…s (rapid)" entry

… workload Adds TestRapidLinearizability — a concurrent worker-pool harness that records a wall-clock history of pread/pwrite ops against a single device and feeds it to anishathalye/porcupine to decide whether the history admits a valid sequential explanation. The model is one register per 4 KiB block (map[int]uint64, block → most-recent stamp). Each write embeds a unique 8-byte stamp at the start of its block; reads recover it. Default workload: 4 workers, 200 ops total, 256 KiB device (64 blocks). Tunable via UBLK_LINZ_OPS and UBLK_LINZ_WORKERS. Implemented in a separate file rather than instrumenting the existing TestRapidStateMachine because rapid drives a strictly sequential state machine — a sequential history is trivially linearizable. The new test runs alongside the rapid one in the existing test-integration CI job (same `integration` build tag), and ships a `make test-linz` target for iterating on the model or workload in isolation.

…ility Bugbot pointed out that `TestRapidLinearizability` was being captured by the existing `test-rapid` Makefile target's `-test.run=TestRapid` regex even though the test does not use the rapid library. The target's stated purpose is "run only the rapid property-based state-machine tests" for quick iteration on shrunk failures, so having porcupine sneak in adds up to 60s of unrelated work to that fast loop. Rename the function (and its doc comment) so it no longer matches the TestRapid prefix, and update the `test-linz` Makefile target's `-test.run` regex accordingly. Reference to TestRapidStateMachine inside the doc comment is intentional — it is the sequential companion test, still named that way, and porcupine remains its "global ordering" follow-up. Reported by Cursor Bugbot on PR #28.

Keep the porcupine target and dependency alongside the newer rapid test support so the branch matches current main and merges cleanly.

Bring in the merged uring fuzz helpers alongside the porcupine and rapid test targets so the branch stays conflict-free on the current main.

Bring in the merged chaos test helpers alongside the porcupine, rapid, and fuzz helpers so the branch stays conflict-free on the current main.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is ON, but it could not run because the branch was deleted or merged before autofix could start.}

^{Reviewed by Cursor Bugbot for commit 94bf7da. Configure here.}

cursor · 2026-05-04T18:02:49Z

+		// fail — porcupine is NP-hard and false-timeouts are real.
+		t.Logf("linearizability check TIMED OUT after %v (history len=%d) "+
+			"— increase UBLK_LINZ_OPS to keep histories small, or "+
+			"raise the checkTimeout in this test", checkTimeout, len(history))


Timeout guidance says "increase" but means "decrease"

Low Severity

The timeout log message says "increase UBLK_LINZ_OPS to keep histories small" but UBLK_LINZ_OPS controls the total number of operations — increasing it makes the history larger, making timeouts more likely. The guidance is inverted and would lead a developer to worsen the problem. It needs to say "decrease".

^{Reviewed by Cursor Bugbot for commit 94bf7da. Configure here.}

ValentaTomas requested a review from arkamar as a code owner May 2, 2026 00:19

cursor Bot reviewed May 2, 2026

View reviewed changes

Comment thread ublk/porcupine_integration_test.go Outdated

ValentaTomas force-pushed the tests/porcupine-linearizability branch from e6f5a16 to 45d783e Compare May 2, 2026 00:39

ValentaTomas added 4 commits May 1, 2026 17:49

docs(TODO): drop now-implemented "Property-based / state machine test…

ef6a719

…s (rapid)" entry

docs(TODO): drop now-implemented "Linearizability checking" entry

5698aca

ValentaTomas force-pushed the tests/porcupine-linearizability branch from 45d783e to 5698aca Compare May 2, 2026 00:51

matthewlouisbrockman approved these changes May 4, 2026

View reviewed changes

Base automatically changed from tests/rapid-state-machine to main May 4, 2026 17:44

ValentaTomas enabled auto-merge (squash) May 4, 2026 17:46

ValentaTomas added 3 commits May 4, 2026 10:48

Merge main into tests/porcupine-linearizability.

0fccfaf

Keep the porcupine target and dependency alongside the newer rapid test support so the branch matches current main and merges cleanly.

Merge main into tests/porcupine-linearizability.

866043e

Bring in the merged uring fuzz helpers alongside the porcupine and rapid test targets so the branch stays conflict-free on the current main.

Merge main into tests/porcupine-linearizability.

94bf7da

Bring in the merged chaos test helpers alongside the porcupine, rapid, and fuzz helpers so the branch stays conflict-free on the current main.

ValentaTomas merged commit 998c5d1 into main May 4, 2026
13 checks passed

ValentaTomas deleted the tests/porcupine-linearizability branch May 4, 2026 17:58

cursor Bot reviewed May 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(ublk): porcupine linearizability on concurrent reads/writes#28

test(ublk): porcupine linearizability on concurrent reads/writes#28
ValentaTomas merged 9 commits into
mainfrom
tests/porcupine-linearizability

ValentaTomas commented May 2, 2026 •

edited

Loading

Uh oh!

cursor Bot commented May 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ValentaTomas commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why this catches things rapid alone doesn't

Implementation choice — Option B

The model

The workload

Tooling

fd-close-before-Close discipline

Test plan

Forbidden checks

Uh oh!

cursor Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 4, 2026

Choose a reason for hiding this comment

Timeout guidance says "increase" but means "decrease"

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ValentaTomas commented May 2, 2026 •

edited

Loading

cursor Bot commented May 2, 2026 •

edited

Loading