Fix/statistic bounded maps#1191
Merged
tobychui merged 3 commits intoJun 6, 2026
Merged
Conversation
The 8 *sync.Map fields in DailySummary (the target of RecordRequest) accept inputs whose cardinality is unbounded in principle — every unique RequestClientIp, Referer, UserAgent, and RequestURL value becomes a permanent map entry until the midnight reset. On a busy host this drove the heap-growth pattern observed in profiling: ~138MB sync.Map.Store + ~71MB bbolt write + ~70MB bytes.growSlice as top heap retainers, amplified by SaveSummaryOfDay's autosave path (full JSON marshal every 600s, sized to the live cardinality). This patch wraps each map with a soft cap enforced via lowest-count eviction: on a new-key insert past the cap, snapshot the map, sort by request count ascending, and drop the bottom 10%. Existing-key increments stay O(1); only new-key over-cap inserts trigger the trim. Concurrent trims are serialized via TryLock so RecordRequest never blocks on stat collection. Dropping the lowest-count entries first was chosen over least- recently-used because the access pattern is increment-only — least-recently-used would tend to evict the persistent high-count entries (the ones that actually matter for the dashboards) and preserve transient one-off entries. Keeping the highest-count entries matches what the dashboards surface (top items per dimension). The cap is hardcoded at 20000 in this commit (sufficient headroom for most deployments: 20000 * 8 maps * ~200B/entry ≈ 32MB worst case). A follow-up commit gates the whole feature behind a CLI flag so the new behavior is opt-in. Tests cover existing-key increment (size stays at 1), low-count eviction (high-count keys survive after 1000 inserts at cap=100), and concurrent stress (16 writers x 2000 unique inserts, race detector clean, size stays bounded near cap).
The bounded behavior added in the previous commit is now an opt-in feature. The default (and the behavior with no flag) is the upstream Load → check → Store path, byte-for-byte unchanged from the maintainer's original code. - New CLI flag -stats_max_entries=N (default 0). 0 disables the cap entirely; positive values cap each of the eight per-dimension sync.Maps in DailySummary at N entries. When the cap is exceeded, the entries with the lowest request counts are dropped first, which keeps the most-frequently-seen items intact and discards transient one-off entries. - New CollectorOption.MaxEntriesPerStatMap field threads the flag through start.go to NewStatisticCollector. - Dispatch happens once at startup via newIncrFn, which returns either unboundedIncr (a verbatim reproduction of the upstream pattern) or a closure around boundedIncr. The eight call sites in RecordRequest now invoke c.incr(...) — one indirect call per dimension, no per-request branching. - The previously hardcoded const maxEntriesPerStatMap is removed; the recommended value when enabling (20000) now lives in the flag's help text and a comment in bounded.go. Two new tests: - TestNewIncrFnDispatch — verifies maxEntries=0 produces unbounded behavior (50 entries kept, sidecar counter untouched) and >0 produces bounded behavior (capped at the configured limit). - TestUnboundedIncrMatchesUpstreamPattern — verifies unboundedIncr reproduces the upstream first-store-1, then-increment behavior. For deployments wanting the bounding behavior: -stats_max_entries=20000 For deployments wanting upstream-identical behavior: pass nothing (or -stats_max_entries=0)
Owner
|
This is a lot of changes. I think I might need some time to review this properly. Rebasing to v3.3.4 for now. |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds an optional per-dimension entry cap to the statistics module to prevent unbounded growth of the per-day high-cardinality maps (IP, UA, Referer, URL, etc.), addressing long-running memory growth/OOM scenarios on busy proxies.
Changes:
- Introduces
-stats_max_entriesto enable a bounded mode for the per-dimension stats maps. - Refactors request recording to use a single increment dispatch (
c.incr) that selects bounded vs. unbounded behavior at startup. - Adds bounded-map trimming logic (evict lowest-frequency entries) plus internal tests for the trimming behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/def.go |
Adds the -stats_max_entries CLI flag for bounding per-dimension stats. |
src/start.go |
Wires the new stats cap option into the statistic collector initialization. |
src/mod/statistic/statistic.go |
Adds bounded-sidecar state to DailySummary, introduces increment dispatch, and routes per-request increments through it. |
src/mod/statistic/bounded.go |
Implements bounded increment + eviction (trim) logic for sync.Map-backed counters. |
src/mod/statistic/bounded_internal_test.go |
Adds unit tests for bounded increment/trim behavior and concurrent insertion behavior. |
src/mod/statistic/structconv.go |
Ensures loaded daily summaries initialize the bounded sidecars consistently with persisted map sizes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Contributor
Author
Owner
|
Thanks for the followup! Will be merging this and deploy to my dev branch this weekend 👍🏻 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Hello,
I fixed an unbounded growth bug in the statistics module with help from Claude.
"per-day statistics maps grow without bound under high-cardinality request inputs, leading to OOM-kill"
Memory usage will grow throughout the day, eventually leading to an out of memory process kill.
I'm getting millions of requests per day from many hundreds of thousands unique ip's.
The attached screenshot from the Resource Monitor plugin shows memory growing steadily before the patch, and a flat line after the patch. I'm running with -fastgeoip otherwise the cpu would be pegged.

Before the patch, sys.db would grow 800mb+ per 24 hours and I was having to delete it at midnight and start over.
After the patch, sys.db grows much slower. Only about 160mb after 4-5 days, because there are fewer items saved to the db. I've tested -stats_max_entries with 20000, 50000, and 100000, and the memory usage has consistently stayed flat. Sys.db growth scaled with the -stats_max_entries cap, as expected.
By Claude:
In mod/statistic, the per-day DailySummary keeps a separate map for each request dimension it tracks — client IP, User-Agent, Referer, request URL, downstream and upstream hostnames, and forwarding type. Each map grows by one entry per distinct value seen, and the maps reset at midnight. For the four dimensions whose values come straight from the request stream (client IP, User-Agent, Referer, URL), the set of possible values is effectively unbounded — every distinct value lands as a new permanent map entry until the next reset. On a busy proxy this can drive memory consumption well into the gigabytes over a day of uptime, amplified by the autosave path which re-serializes the whole summary to disk every ten minutes. A handful of past issues have reported Zoraxy gradually consuming all available memory across multi-day cycles (most notably #1007 and parts of the discussion on #816); the existing -fastgeoip workaround addresses a different startup-time allocation and doesn't help with this runtime growth.
This patch adds a per-map size cap that's opt-in via a new CLI flag, -stats_max_entries=N. When unset (or =0), behavior is byte-for-byte identical to the current code — same Load → check → Store pattern, no overhead, no risk to existing deployments. When set to a positive value, each of the eight maps stays at or under N entries. Once a map fills up, the next new insert triggers a quick trim: it scans the map, sorts entries by how many requests they've received, and drops the lowest-count 10% to make room. Existing entries continue to count up at the same speed they always did; the trim only runs on the relatively rare event of a new entry arriving at a full map, and trims that happen at the same time are serialized so request handling never blocks on stat collection.
The choice to drop the lowest-count entries (rather than oldest, newest, or random) means the data the dashboards actually display is preserved. The top-IPs, top-User-Agents, top-URLs panels all surface the most-frequently-seen items in each dimension — and those are exactly the entries the cap keeps. What gets dropped is the long tail: a single visit from an unusual IP, a malformed Referer, a one-time URL request. None of those would have shown up on the dashboard anyway. The top-line totals (TotalRequest / ValidRequest / ErrorRequest) are completely untouched, so the "X requests today" headline remains exact regardless of the cap. Recommended starting value is -stats_max_entries=20000, which gives plenty of headroom for legitimate diversity while bounding the stats structure at roughly 32 MB worst case across all eight maps.