v1.2.2 — close production-readiness review items on /self/*#9
Conversation
Three small fixes flagged in a production-readiness review of v1.2.1. Lands the integration test suite that was created during e2e verification but never committed. (1) Cap rate-limit + log-coalesce Maps with LRU eviction. The 30s sweep cleans expired entries, but a high-cardinality flood (rotating IPs / ownerIds / reasons) could grow either Map between sweeps. Hard cap at 10k entries each + LRU touch on every access keeps memory bounded regardless of input shape. Map preserves insertion order in JS, so evicting the oldest key (via m.keys().next().value) is constant-time and idiomatic. (2) Flip SELF_BIND default 0.0.0.0 to 127.0.0.1. Safer default: a self-hoster who copy-pastes the compose file without reading the README no longer exposes 4003 directly to the public internet. The HMAC + rate limit cap the worst case, but defaulting to localhost-only and requiring an explicit SELF_BIND=0.0.0.0 to opt-in matches the pattern we already use for the admin port. (3) README Caddyfile snippet covers both deployment patterns. Previously only documented the dedicated-subdomain pattern (self.example.com); now also shows the path-routing-on-same- hostname pattern (handle /self/*) which is what sync.getbased.health actually deploys. Same client behavior either way. Plus: includes the integration test suite that was created during end-to-end verification of v1.2.0 but hadn't been committed yet — boots the relay in-process with a synthetic owner + writeKey + 5 fake message rows, then signs + POSTs a real HTTP compact and asserts the DB transaction actually empties the rows. Plus 3 new LRU tests on the eviction path. 21 tests total green (12 unit + 6 integration + 3 LRU). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Greptile SummaryThis PR lands three production-readiness items for the The LRU implementation is correct — Confidence Score: 4/5Safe to merge; only P2 findings present — one stale comment and a loose test assertion, neither affects runtime behaviour. All findings are P2: a stale file-header comment and a test with an implicit order dependency. The core LRU logic, the SELF_BIND default flip, and the integration tests are all correct. P2-only ceiling is 4/5. The file-header comment block at the top of src/lib/self-server.ts (lines 31–33) should be updated to reflect the new 127.0.0.1 default. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
REQ[Incoming /self/* request] --> IP[clientIp: loopback? trust XFF]
IP --> RC[rateCheck: bucket lookup]
RC -->|new or expired| NEW[delete + set count=1\nevictOldest if size > 10k]
RC -->|within window, count < cap| LRU_OK[delete + increment count + re-set\nLRU touch — no eviction needed]
RC -->|within window, count >= cap| LRU_429[delete + re-set\nLRU touch — return 429]
NEW --> AUTH[verifySignature: HMAC + 5min window]
LRU_OK --> AUTH
LRU_429 --> RESP429[429 Retry-After]
AUTH -->|fail| LOG[logShouldEmit: coalesce dedup\ndelete + re-set + evictOldest if size > 10k]
AUTH -->|pass| HANDLER[compact-owner / owner-storage]
LOG --> RESP401[401 unauthorized]
HANDLER --> DBOP[SQLite DELETE/UPDATE or SELECT]
DBOP --> RESP200[200 JSON response]
SWEEP[setInterval 30s sweep] -.->|delete expired| RC
SWEEP -.->|emit coalesced warn + delete| LOG
|
| body: JSON.stringify({ ownerId: ownerIdStr, timestamp: ts, signature: sig }), | ||
| }); | ||
| codes.push(r.status); | ||
| } | ||
| // First 10 hit the bucket → 200 (no msgs to delete, but auth+route OK). | ||
| // Last 2 → 429. | ||
| // NOTE: the previous test in this suite already burned 1 token, so | ||
| // we expect first 9 of THIS test to be 200, then 429s. Between the | ||
| // earlier idempotent test, the storage probes don't count (different | ||
| // bucket), so compact has burned 1+1=2 by here. So 8 of THIS test's | ||
| // 12 should be 200, and the rest 429. | ||
| // To keep this deterministic, just assert: at least one 429 fired, | ||
| // and all non-429s are 200. | ||
| const success = codes.filter(c => c === 200).length; | ||
| const limited = codes.filter(c => c === 429).length; | ||
| assert.ok(limited >= 1, `expected at least one 429, got codes ${codes.join(",")}`); | ||
| assert.ok(success + limited === 12, "every request returned 200 or 429"); | ||
| }); |
There was a problem hiding this comment.
Rate-limit test has an implicit order dependency
The test's correctness relies on exactly two compact calls having been made in prior tests. The comment acknowledges this, but the assertion limited >= 1 means the test still passes even if the bucket resets mid-suite (e.g. due to test parallelism or a future reorder). A self-contained approach — spinning up a fresh createSelfServer inside this test and burning a known number of tokens — would make the assertion strong enough to actually catch a regression in the eviction/rate-limit path.
Context
After v1.2.0 (
/self/*endpoints) and v1.2.1 (rate limit + log coalescing) shipped, a production-readiness review flagged three small items worth landing before announcing the feature publicly. The endpoints are already deployed live onsync.getbased.healthand verified end-to-end against the lab-charts client (real owner round-trip: storage probe returned1300 bytes, compact dropped 2 messages, post-compact probe returned0). This PR addresses the three items + lands the integration test suite that was created during that e2e verification but not committed.What's in here
1. LRU eviction caps on rate-limit + coalesce Maps
The 30-second sweep cleans expired entries, but a high-cardinality flood (rotating IPs / ownerIds / reasons) could grow either Map between sweeps. Hard cap at 10k entries each + LRU touch on every access keeps memory bounded regardless of input shape. JS
Mappreserves insertion order, so evicting the oldest key (m.keys().next().value) is constant-time and idiomatic.Both
rateCheckandlogShouldEmitnowdelete+ re-setthe key on every access (LRU touch), thenevictOldestruns after the insert. 3 new tests verify (a) recently-touched keys survive new inserts, (b) the bucket count tracking still works correctly across LRU touches, and (c) the coalesce dedup state isn't reset by re-insert.2.
SELF_BINDdefault flipped0.0.0.0→127.0.0.1Safer default: a self-hoster who copy-pastes the compose file without reading the README no longer exposes 4003 directly to the public internet. The HMAC + rate limit cap the worst case, but defaulting to localhost-only and requiring an explicit
SELF_BIND=0.0.0.0to opt-in matches the pattern we already use for the admin port..env.example,README.md, and the config table all updated to match.3. README Caddyfile snippet covers both deployment patterns
Previously only documented the dedicated-subdomain pattern (
self.example.com); now also shows the path-routing-on-same-hostname pattern (handle /self/*) which is whatsync.getbased.healthactually deploys. Both patterns work — theget-basedclient deriveshttps://<relay-hostname>/self/...from the WebSocket URL by default. Self-hosters who skip the reverse proxy entirely can use thelabcharts-self-urllocalStorage override on the client.4. Integration test suite (load-bearing — not previously committed)
test/self-server.integration.test.mjs(230 lines, 6 tests) boots the relay in-process with a synthetic owner + writeKey + 5 fake message rows in a tmp SQLite DB, then signs + POSTs a real HTTP compact request and asserts the DB transaction actually empties the rows + zeroesstoredBytes. This is the load-bearing assertion that the static unit tests can't cover. It was created during end-to-end verification of v1.2.0 but the file was nevergit add'd.Tests
21 total green (was 18 before this PR):
Run:
npm test(build + node:test).Test plan
npm test— all 21 greensync.getbased.health, no client changes needed