statsig-go: serialize purego FFI dispatches to dodge upstream race#12
Closed
nsaini-figma wants to merge 1 commit into
Closed
statsig-go: serialize purego FFI dispatches to dodge upstream race#12nsaini-figma wants to merge 1 commit into
nsaini-figma wants to merge 1 commit into
Conversation
Add a process-wide sync.Mutex on *StatsigFFI and take it across every purego-registered call this binding dispatches. JSON marshal/unmarshal on the Go side stays outside the critical section. Workaround for a concurrent-FFI-return-value race in github.com/ebitengine/purego v0.9.x. Two goroutines making purego calls in flight simultaneously can have their return pointers swapped — the *c_char one caller expected lands in the other caller's read, and vice versa. Downstream this surfaces as the labmate crashes (SIGSEGV in runtime.memmove on a non-canonical pointer, nil-deref at the *gateJson read in GetFeatureGateWithOptions, or glibc "double free or corruption (out)" when two callers free the same buffer). Repros at every upstream Rust SDK version we tested (0.15.1, 0.16.0, 0.17.0, 0.18.0, 0.19.0-0.19.3, this fork's 0.19.4-figma1) when paired with the current Go binding. Does NOT reproduce when the same .so is called from plain C with pthreads under identical workload — purego is the only path that exhibits it. Minimal purego-only repro: 10 lines of C, 80 lines of Go, two concurrent goroutines calling a function with signature `func(uint64) *byte` see their return pointers swapped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a process-wide
sync.Mutexon*StatsigFFIand takes it across every purego-registered call this binding dispatches. Workaround — not a root-cause fix — for a concurrent-FFI-return-value race ingithub.com/ebitengine/purego(v0.9.x) that has been causing intermittent labmate crashes since the cutover from the upstream binding atv0.15.1to this fork atstatsig-go/v0.19.4-figma1.Why
The labmate FFI SIGSEGV (memmove on non-canonical pointer + glibc "double free or corruption (out)" + nil-deref at
*gateJson) was narrowed end-to-end to a purego concurrency bug. Two goroutines making purego dispatches in flight at the same time can have their return pointers swapped between callers: caller A gets B's*c_charand vice versa. Once the wrong pointer lands in Go,runtime.memmoveoverunsafe.Slice(ptr, len)faults, and the deferredfree_string(ptr)makes two callers race to free the same buffer.Discrimination matrix (full writeup forthcoming, not committed here):
libstatsig_ffi.so: 32 threads, ~16M ops, clean. So the Rust side is exonerated.func(uint64) *byte): 2 goroutines crash within seconds with return pointers literally swapped. So the bug is purego itself.sync.MutexaroundUseRustStringmakes the production reproducer run ~13M ops over 30s with zero crashes.What changed
statsig-go/statsig_ffi.go: addmu sync.MutextoStatsigFFIwith full motivation in comment. RewriteUseRustStringto holdmuacrosshandler()+free_string.statsig-go/statsig.go: lock around every direct purego dispatch (NewStatsig,NewStatsigWithOptions,Initialize,Shutdown,FlushEvents,LogEvent,CheckGateWithOptions, the 4ManuallyLog*Exposuremethods, andrelease). TheGet*Gate / Config / Experiment / Layer / ParameterStore / ClientInitResponsemethods inherit protection viaUseRustString.statsig-go/statsig_types.go: lock aroundLayer.logExposureand the primitive-returningParameterStoregetters (GetBool,GetNumber,GetInt).GetString/GetMap/GetSliceare alreadyUseRustString-protected.statsig-go/statsig_user.go,statsig-go/statsig_options.go: lock around create + finalizer-release.statsig-go/data_store.go,statsig-go/observability_client.go,statsig-go/persistent_storage.go: lock around the_createconstructor calls and finalizer_releasecalls. The Rust→Go callbacks (Get/Set/Increment/etc.) deliberately do NOT takemu— see lock-ordering note on themufield.Invariants worth calling out
mu. Comment on themufield spells this out — Rust→Go callbacks fired synchronously from inside an FFI call this binding has dispatched would self-deadlock otherwise. None of the existing callback closures touch the mutex, but anyone adding new callback wiring needs to keep this invariant..sochange, no version bump. Pure Go-side patch. Nostatsig-go/v0.19.4-figma2tag yet — wait for review and a labmate canary before cutting.statsig_override_*,statsig_remove_*_override,statsig_identify, asyncstatsig_shutdown/statsig_flush_events,statsig_check_gate_performance) are untouched. When wrappers are added, they'll need the same pattern.ebitengine/puregowith the minimalfunc(uint64) *byterepro. The mutex here is a tourniquet pending that fix.Test plan
go build ./statsig-go/...passes.v0.19.4-figma1.go test ./statsig-go/...once CI runs./tmp/statsig-repro/on devbox.v0.19.4-figma2build of this branch before promoting fleet-wide.🤖 Generated with Claude Code