test(uring): add FuzzRingSubmit and FuzzRingCancel#25
Conversation
Two coverage-guided fuzz targets that protect the io_uring SQ/CQ arithmetic and the WaitCQE-vs-Cancel race documented in AGENTS.md. Both targets need no kernel module and no root, so they run as plain unit tests against the seed corpus and as a short live fuzz job (15s/target) in CI. - FuzzRingSubmit drives variable-sized submit batches drawn from the fuzzer input and asserts every UserData comes back exactly once. Targets nextSQE/flushSQ/WaitCQE/PeekCQE/SeenCQE arithmetic — bugs there would manifest as duplicated, missing, or reordered CQEs. - FuzzRingCancel keeps the CQ non-empty with two producer goroutines while the consumer loops in WaitCQE; Cancel must be observed within 2s. Direct guard for the cancelled-flag fast-path race. Adds 'make fuzz-uring' (FUZZTIME-overridable) and a CI job that uploads any discovered crashers as an artifact.
PR SummaryMedium Risk Overview Updates CI and the Reviewed by Cursor Bugbot for commit a47dd91. Bugbot is set up for automated code reviews on this repo. Configure here. |
- Removes the unused `ids` slice in FuzzRingSubmit (staticcheck SA4010). - Folds 10s/target coverage-guided mutation into `make test-unit` so local and CI runs match. The dedicated fuzz CI job is dropped; the unit-test job now seeds the corpus AND runs the mutator. Crashers are still uploaded as a per-arch artifact on failure. - `make fuzz-uring` (FUZZTIME-overridable) stays as the longer campaign target.
Under coverage-guided fuzzing the framework spins up GOMAXPROCS-many workers that each construct a fresh ring per iteration. On constrained hosts (notably the GitHub Actions runners) this briefly outpaces the kernel's release of mmap'd ring pages and trips the per-process memlock / io_uring resource accounting, producing a spurious 'io_uring_setup: cannot allocate memory' inside Ring.New. That symptom isn't a bug in our SQ/CQ arithmetic — it's a kernel resource-limit symptom — so treat ENOMEM (and ENOSYS, for kernels without io_uring at all) as t.Skip rather than t.Fatal. The existing seed-corpus runs (TestNOPRoundTrip, TestManyCycles, etc.) still use plain New and would surface real setup failures. Reproduced on CI with parallelism 4 in <3s; passes locally with the same parallelism for 15s / ~900K execs.
A copy/paste left two identical `$(MAKE) fuzz-uring FUZZTIME=10s` lines in the test-unit target, doubling the per-run fuzz budget for no extra coverage. Spotted by Cursor Bugbot on #25.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit c648429. Configure here.
`FuzzRingCancel` placed `stopProducers.Store(true)` and
`producerWg.Wait()` as straight-line code AFTER the
`select { ... case <-time.After(2*time.Second): t.Fatal(...) }`.
When the timeout fires, `t.Fatal` calls `runtime.Goexit()`, which
skips those lines and jumps straight to `defer r.Close()`. The
producer goroutines are still running when `Close()` munmaps the
ring, so the next `nextSQE()` iteration dereferences the unmapped
pages and SIGSEGVs the process — masking the original assertion
failure.
Convert the producer shutdown into a `defer` registered AFTER
`defer r.Close()` so it runs BEFORE Close on every exit path
(defers are LIFO). The consumer is already woken by the
idempotent `r.Cancel()` issued before the select, so it needs no
extra cleanup.
Reported by Cursor Bugbot on PR #25.
matthewlouisbrockman
left a comment
There was a problem hiding this comment.
lgtm, merge conflicts tho
Resolve the Makefile overlap with the newer rapid-test target so the fuzz test branch matches current main and can be merged cleanly.

Summary
Two coverage-guided fuzz targets in `ublk/uring/fuzz_test.go`. Both require no kernel module and no root, so they run as plain unit tests on the seed corpus and as a short live fuzz job (15s/target) in CI.
CI / tooling
Test plan