Skip to content

purego: vendor v0.9.0 with concurrent-FFI race revert#13

Closed
nsaini-figma wants to merge 1 commit into
mainfrom
nsaini/vendor-purego-revert
Closed

purego: vendor v0.9.0 with concurrent-FFI race revert#13
nsaini-figma wants to merge 1 commit into
mainfrom
nsaini/vendor-purego-revert

Conversation

@nsaini-figma
Copy link
Copy Markdown
Collaborator

@nsaini-figma nsaini-figma commented May 19, 2026

Summary

Adds purego/ as a new subdir module mirroring statsig-go/ and binaries-linux-gnu/. Contents are upstream ebitengine/purego v0.9.0 verbatim except for an 8-line revert in func.go and syscall_sysv.go that removes the racy sync.Pool of *syscall15Args introduced by upstream PR #282 and reaffirmed in PR #328.

Consumers redirect to this vendored copy via one new replace line in their own go.mod. No change to statsig-go/ itself.

Supersedes PR #12 (binding-side sync.Mutex workaround). #12 should be closed once this lands.

Why

A statsig-go consumer was hitting recurring FFI SIGSEGVs ("memmove on non-canonical pointer" / "double free or corruption (out)" / nil-deref at *gateJson) which narrowed to a concurrent-FFI-return-value race in purego v0.9.x. Two goroutines dispatching purego-registered functions in flight at the same time can have their return pointers swapped between callers. Discrimination matrix written up separately; the relevant facts:

  • Reproduces against every Rust SDK version 0.15.1 → 0.19.3 + this fork's 0.19.4-figma1 when paired with the current Go binding. So this is not a Rust regression.
  • Does NOT reproduce when the same libstatsig_ffi.so is called from plain C with 32 pthreads under identical workload (16M+ ops clean). So it's not Rust, not libc malloc, not the kernel.
  • A minimal purego-only repro — single C function char *alloc(uint64_t seed) + Go driver with N goroutines — reproduces the swap within seconds at workers=2. Each goroutine uses a disjoint seed range; two goroutines observably get back the other's buffer.
  • Reverting the thePool recycling in func.go:310-311 and syscall_sysv.go:18-19 (back to pre-#282 stack allocation) eliminates the race in 137M+ ops of the minimal repro and 37M+ ops of the full statsig-go gate-eval workload.

This vendored revert is the workaround pending an upstream fix. When upstream lands one, drop the replace line in consumers and delete purego/ from this repo. The revert is intentionally minimal (only thePool.Get/Put removed in two files) to make that a one-line revert.

What changed

  • Adds purego/ containing all of upstream ebitengine/purego v0.9.0.
  • Two files are patched, lines clearly marked with // FIGMA PATCH comments:
    • purego/func.go:310-318 — revert PR #282's thePool.Get/Put to stack-allocated syscall15Args.
    • purego/syscall_sysv.go:17-29 — matching revert for the SyscallN path from PR #328, including restoring //go:nosplit.
  • purego/go.mod declares module github.com/ebitengine/purego (unchanged from upstream) so that internal imports continue to resolve. The replace directive in consumer modules handles redirection.
  • purego/FIGMA-FORK.md documents the patch, consumer wiring, and removal procedure.
  • Upstream's .github/, .gitignore, and examples/ are excluded (would create CI / dependabot noise in this repo).

Invariants worth calling out

  • Module path unchanged. purego/go.mod keeps github.com/ebitengine/purego. Cross-package imports inside the vendored copy continue to resolve. The replace substitution is what redirects consumers.
  • Patch is two files, 8 net lines. Anything bigger should be rejected on review; the surface area of this vendored copy is the entire purego library and we want the delta to upstream small enough that a rebase is mechanical.
  • thePool declaration retained. It's no longer used but is left in place to minimize the diff. Unused package-level vars don't fail Go compilation. Drop on next rebase if upstream removes it too.
  • No statsig-go changes in this PR. Once this merges and is tagged purego/v0.9.0-figma1, consumers add replace github.com/ebitengine/purego => github.com/figma/statsig-server-core/purego v0.9.0-figma1 to their own go.mod.

Test plan

  • go build ./... inside purego/ — clean.
  • Concurrent-FFI repro at 32 goroutines × 30s × 3 runs against the in-fork vendored copy via replace — ~245k ops/sec sustained, zero crashes, zero mismatches. Same workload against unpatched purego v0.9.0 reliably crashes within ~5 seconds.
  • Minimal purego-only repro at workers=32 × 60s — 137M ops, zero return-value mismatches. Same workload against unpatched purego v0.9.0 mismatches within seconds.
  • go test ./purego/... once CI is wired (purego ships a substantial test suite — should be run on the vendored copy with the patch).
  • Reviewer optionally re-runs the discrimination matrix from the repro tree on devbox.
  • Once tagged purego/v0.9.0-figma1, validate consumer integration end-to-end by pointing a consumer's go.mod replace at the new tag and running its test suite + a soak.

🤖 Generated with Claude Code

Adds purego/ as a subdir module mirroring statsig-go/ and
binaries-linux-gnu/. Contents are upstream ebitengine/purego at tag
v0.9.0 verbatim, with one local patch reverting the package-wide
sync.Pool of *syscall15Args (introduced by upstream PR #282 and
reaffirmed in #328) back to a per-call stack allocation.

The pool optimization saved one allocation per FFI call but introduced
a race where two goroutines dispatching purego-registered functions
concurrently can observe each other's return values. In statsig-go
consumers (labmate is the in-tree example) this surfaces as SIGSEGV
in runtime.memmove on non-canonical pointers, glibc 'double free or
corruption (out)', nil-deref at the *gateJson read in
GetFeatureGateWithOptions, and — most insidiously — silently-swapped
feature-flag evaluation results that pass type checks downstream.

Reverting the pool resolves the race in 137M+ ops of the minimal
purego-only repro (32 goroutines × 60s, zero mismatches) and 37M+ ops
of the full statsig-go gate-eval workload (no crashes, no
double-frees). Allocation overhead is acceptable; correctness has to
come first.

Module path inside purego/go.mod is intentionally kept as
github.com/ebitengine/purego so that internal imports continue to
resolve. Consumers redirect via go.mod replace:

    replace github.com/ebitengine/purego =>
        github.com/figma/statsig-server-core/purego v0.9.0-figma1

When upstream lands a real fix, drop the replace line and delete this
directory. See purego/FIGMA-FORK.md for ongoing maintenance notes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant