Skip to content

test(uring): add FuzzRingSubmit and FuzzRingCancel#25

Merged
ValentaTomas merged 7 commits into
mainfrom
tests/fuzz-uring
May 4, 2026
Merged

test(uring): add FuzzRingSubmit and FuzzRingCancel#25
ValentaTomas merged 7 commits into
mainfrom
tests/fuzz-uring

Conversation

@ValentaTomas
Copy link
Copy Markdown
Member

@ValentaTomas ValentaTomas commented May 1, 2026

Summary

Two coverage-guided fuzz targets in `ublk/uring/fuzz_test.go`. Both require no kernel module and no root, so they run as plain unit tests on the seed corpus and as a short live fuzz job (15s/target) in CI.

  • `FuzzRingSubmit` drives variable-sized submit batches drawn from the fuzzer input and asserts every `UserData` comes back exactly once across many cycles. Targets the `nextSQE` / `flushSQ` / `WaitCQE` / `PeekCQE` / `SeenCQE` arithmetic. `TestManyCycles` only exercises a uniform full-fill / full-drain cadence; this fuzzer explores the partial-fill space.
  • `FuzzRingCancel` runs two producer goroutines submitting NOPs in a tight loop while the consumer calls `WaitCQE` repeatedly. After a fuzz-derived delay we call `Cancel` and assert the consumer observes `ErrCancelled` within 2s even though the CQ is being kept non-empty. Direct regression guard for the fast-path cancel race documented in AGENTS.md "Ring.Cancel must be observable from the busy path".

CI / tooling

  • New `fuzz-uring` workflow job: 15s per target, uploads crashers as an artifact on failure.
  • New `make fuzz-uring` target (default `FUZZTIME=30s`, e.g. `FUZZTIME=2m make fuzz-uring`).
  • Strikes the corresponding bullet in `TODO.md`.

Test plan

  • `go test -count=1 -race ./ublk/uring/` — seed corpus passes locally on Go 1.25.6.
  • CI `fuzz-uring` job runs both 15s campaigns clean.
  • CI `test-unit` job still includes the seed-corpus runs.

Two coverage-guided fuzz targets that protect the io_uring SQ/CQ
arithmetic and the WaitCQE-vs-Cancel race documented in AGENTS.md.
Both targets need no kernel module and no root, so they run as
plain unit tests against the seed corpus and as a short live fuzz
job (15s/target) in CI.

- FuzzRingSubmit drives variable-sized submit batches drawn from the
  fuzzer input and asserts every UserData comes back exactly once.
  Targets nextSQE/flushSQ/WaitCQE/PeekCQE/SeenCQE arithmetic — bugs
  there would manifest as duplicated, missing, or reordered CQEs.
- FuzzRingCancel keeps the CQ non-empty with two producer goroutines
  while the consumer loops in WaitCQE; Cancel must be observed
  within 2s. Direct guard for the cancelled-flag fast-path race.

Adds 'make fuzz-uring' (FUZZTIME-overridable) and a CI job that
uploads any discovered crashers as an artifact.
@ValentaTomas ValentaTomas requested a review from arkamar as a code owner May 1, 2026 23:59
@cursor
Copy link
Copy Markdown

cursor Bot commented May 1, 2026

PR Summary

Medium Risk
Moderate risk due to added CI runtime and potential environment-specific flakiness from kernel io_uring/resource limits, though the fuzz tests explicitly skip ENOMEM/ENOSYS cases. No production code paths are modified.

Overview
Adds two new fuzz tests (FuzzRingSubmit, FuzzRingCancel) in ublk/uring/fuzz_test.go to stress SQ/CQ bookkeeping and the WaitCQE/Cancel cancellation behavior, including a guard to skip when io_uring is unavailable or resource-limited.

Updates CI and the Makefile to run short, coverage-guided fuzzing for these targets (with configurable FUZZTIME) and to upload any fuzz crashers on failure; removes the corresponding TODO entry.

Reviewed by Cursor Bugbot for commit a47dd91. Bugbot is set up for automated code reviews on this repo. Configure here.

@ValentaTomas ValentaTomas enabled auto-merge (squash) May 2, 2026 00:00
- Removes the unused `ids` slice in FuzzRingSubmit (staticcheck SA4010).
- Folds 10s/target coverage-guided mutation into `make test-unit` so
  local and CI runs match. The dedicated fuzz CI job is dropped; the
  unit-test job now seeds the corpus AND runs the mutator. Crashers
  are still uploaded as a per-arch artifact on failure.
- `make fuzz-uring` (FUZZTIME-overridable) stays as the longer
  campaign target.
Comment thread Makefile Outdated
Under coverage-guided fuzzing the framework spins up GOMAXPROCS-many
workers that each construct a fresh ring per iteration. On
constrained hosts (notably the GitHub Actions runners) this briefly
outpaces the kernel's release of mmap'd ring pages and trips the
per-process memlock / io_uring resource accounting, producing a
spurious 'io_uring_setup: cannot allocate memory' inside Ring.New.

That symptom isn't a bug in our SQ/CQ arithmetic — it's a kernel
resource-limit symptom — so treat ENOMEM (and ENOSYS, for kernels
without io_uring at all) as t.Skip rather than t.Fatal. The existing
seed-corpus runs (TestNOPRoundTrip, TestManyCycles, etc.) still use
plain New and would surface real setup failures.

Reproduced on CI with parallelism 4 in <3s; passes locally with the
same parallelism for 15s / ~900K execs.
A copy/paste left two identical `$(MAKE) fuzz-uring FUZZTIME=10s`
lines in the test-unit target, doubling the per-run fuzz budget for
no extra coverage. Spotted by Cursor Bugbot on #25.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit c648429. Configure here.

Comment thread ublk/uring/fuzz_test.go Outdated
`FuzzRingCancel` placed `stopProducers.Store(true)` and
`producerWg.Wait()` as straight-line code AFTER the
`select { ... case <-time.After(2*time.Second): t.Fatal(...) }`.
When the timeout fires, `t.Fatal` calls `runtime.Goexit()`, which
skips those lines and jumps straight to `defer r.Close()`. The
producer goroutines are still running when `Close()` munmaps the
ring, so the next `nextSQE()` iteration dereferences the unmapped
pages and SIGSEGVs the process — masking the original assertion
failure.

Convert the producer shutdown into a `defer` registered AFTER
`defer r.Close()` so it runs BEFORE Close on every exit path
(defers are LIFO). The consumer is already woken by the
idempotent `r.Cancel()` issued before the select, so it needs no
extra cleanup.

Reported by Cursor Bugbot on PR #25.
Copy link
Copy Markdown

@matthewlouisbrockman matthewlouisbrockman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, merge conflicts tho

Resolve the Makefile overlap with the newer rapid-test target so the fuzz test branch matches current main and can be merged cleanly.
@ValentaTomas ValentaTomas merged commit a1a9713 into main May 4, 2026
14 checks passed
@ValentaTomas ValentaTomas deleted the tests/fuzz-uring branch May 4, 2026 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants