Skip to content

v1.2.2 — close production-readiness review items on /self/*#9

Merged
elkimek merged 1 commit into
mainfrom
chore/self-server-hardening
May 5, 2026
Merged

v1.2.2 — close production-readiness review items on /self/*#9
elkimek merged 1 commit into
mainfrom
chore/self-server-hardening

Conversation

@elkimek
Copy link
Copy Markdown
Owner

@elkimek elkimek commented May 4, 2026

Context

After v1.2.0 (/self/* endpoints) and v1.2.1 (rate limit + log coalescing) shipped, a production-readiness review flagged three small items worth landing before announcing the feature publicly. The endpoints are already deployed live on sync.getbased.health and verified end-to-end against the lab-charts client (real owner round-trip: storage probe returned 1300 bytes, compact dropped 2 messages, post-compact probe returned 0). This PR addresses the three items + lands the integration test suite that was created during that e2e verification but not committed.

What's in here

1. LRU eviction caps on rate-limit + coalesce Maps

The 30-second sweep cleans expired entries, but a high-cardinality flood (rotating IPs / ownerIds / reasons) could grow either Map between sweeps. Hard cap at 10k entries each + LRU touch on every access keeps memory bounded regardless of input shape. JS Map preserves insertion order, so evicting the oldest key (m.keys().next().value) is constant-time and idiomatic.

function evictOldest<K, V>(m: Map<K, V>, cap: number): void {
  while (m.size > cap) {
    const oldest = m.keys().next().value;
    if (oldest === undefined) break;
    m.delete(oldest);
  }
}

Both rateCheck and logShouldEmit now delete + re-set the key on every access (LRU touch), then evictOldest runs after the insert. 3 new tests verify (a) recently-touched keys survive new inserts, (b) the bucket count tracking still works correctly across LRU touches, and (c) the coalesce dedup state isn't reset by re-insert.

2. SELF_BIND default flipped 0.0.0.0127.0.0.1

Safer default: a self-hoster who copy-pastes the compose file without reading the README no longer exposes 4003 directly to the public internet. The HMAC + rate limit cap the worst case, but defaulting to localhost-only and requiring an explicit SELF_BIND=0.0.0.0 to opt-in matches the pattern we already use for the admin port.

.env.example, README.md, and the config table all updated to match.

3. README Caddyfile snippet covers both deployment patterns

Previously only documented the dedicated-subdomain pattern (self.example.com); now also shows the path-routing-on-same-hostname pattern (handle /self/*) which is what sync.getbased.health actually deploys. Both patterns work — the get-based client derives https://<relay-hostname>/self/... from the WebSocket URL by default. Self-hosters who skip the reverse proxy entirely can use the labcharts-self-url localStorage override on the client.

4. Integration test suite (load-bearing — not previously committed)

test/self-server.integration.test.mjs (230 lines, 6 tests) boots the relay in-process with a synthetic owner + writeKey + 5 fake message rows in a tmp SQLite DB, then signs + POSTs a real HTTP compact request and asserts the DB transaction actually empties the rows + zeroes storedBytes. This is the load-bearing assertion that the static unit tests can't cover. It was created during end-to-end verification of v1.2.0 but the file was never git add'd.

Tests

21 total green (was 18 before this PR):

  • 6 integration (boot relay + real HTTP roundtrip)
  • 12 unit (HMAC verify path, rate-limit logic, log coalesce, clientIp XFF gate)
  • 3 new LRU tests on the eviction path

Run: npm test (build + node:test).

Test plan

  • npm test — all 21 green
  • e2e verified against deployed v1.2.1 in the previous session
  • Greptile review
  • After merge: tag v1.2.2, deploy to sync.getbased.health, no client changes needed

Three small fixes flagged in a production-readiness review of v1.2.1.
Lands the integration test suite that was created during e2e
verification but never committed.

(1) Cap rate-limit + log-coalesce Maps with LRU eviction.
The 30s sweep cleans expired entries, but a high-cardinality flood
(rotating IPs / ownerIds / reasons) could grow either Map between
sweeps. Hard cap at 10k entries each + LRU touch on every access
keeps memory bounded regardless of input shape. Map preserves
insertion order in JS, so evicting the oldest key (via
m.keys().next().value) is constant-time and idiomatic.

(2) Flip SELF_BIND default 0.0.0.0 to 127.0.0.1.
Safer default: a self-hoster who copy-pastes the compose file
without reading the README no longer exposes 4003 directly to the
public internet. The HMAC + rate limit cap the worst case, but
defaulting to localhost-only and requiring an explicit
SELF_BIND=0.0.0.0 to opt-in matches the pattern we already use
for the admin port.

(3) README Caddyfile snippet covers both deployment patterns.
Previously only documented the dedicated-subdomain pattern
(self.example.com); now also shows the path-routing-on-same-
hostname pattern (handle /self/*) which is what sync.getbased.health
actually deploys. Same client behavior either way.

Plus: includes the integration test suite that was created during
end-to-end verification of v1.2.0 but hadn't been committed yet —
boots the relay in-process with a synthetic owner + writeKey + 5
fake message rows, then signs + POSTs a real HTTP compact and
asserts the DB transaction actually empties the rows. Plus 3 new
LRU tests on the eviction path. 21 tests total green
(12 unit + 6 integration + 3 LRU).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 4, 2026

Greptile Summary

This PR lands three production-readiness items for the /self/* endpoints: LRU eviction caps (10 k entries) on the rate-limit and coalesce Maps, a safer SELF_BIND default of 127.0.0.1, and an expanded Caddyfile README snippet. It also commits the integration test suite (6 tests, full HTTP round-trip against a tmp SQLite DB) that was created during v1.2.0 e2e verification but never staged.

The LRU implementation is correct — delete/re-set for insertion-order promotion, evictOldest only on net-size-increasing inserts, sweep still handles natural expiry — and the unit + integration coverage is solid. One stale file-header comment in self-server.ts still describes the old 0.0.0.0 default and should be updated.

Confidence Score: 4/5

Safe to merge; only P2 findings present — one stale comment and a loose test assertion, neither affects runtime behaviour.

All findings are P2: a stale file-header comment and a test with an implicit order dependency. The core LRU logic, the SELF_BIND default flip, and the integration tests are all correct. P2-only ceiling is 4/5.

The file-header comment block at the top of src/lib/self-server.ts (lines 31–33) should be updated to reflect the new 127.0.0.1 default.

Important Files Changed

Filename Overview
src/lib/self-server.ts Adds LRU eviction (evictOldest + delete/re-set pattern) to rate-limit and coalesce Maps; logic is correct but the file-header comment still references the old 0.0.0.0 default.
test/self-server.integration.test.mjs New 210-line integration suite boots a real in-process server; rate-limit test relies on test execution order with a loose assertion, but all happy-path and DB-mutation tests are solid.
src/lib/config.ts Default for SELF_BIND changed from 0.0.0.0 to 127.0.0.1 with a clear comment explaining the rationale.
test/self-server.test.mjs Adds 3 focused unit tests for LRU eviction correctness; all assertions are self-contained and cover the important invariants.
.env.example SELF_BIND default updated to 127.0.0.1 with explanatory comment.
README.md Adds option-B Caddyfile path-routing pattern and updates SELF_BIND default documentation; accurate and clear.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    REQ[Incoming /self/* request] --> IP[clientIp: loopback? trust XFF]
    IP --> RC[rateCheck: bucket lookup]
    RC -->|new or expired| NEW[delete + set count=1\nevictOldest if size > 10k]
    RC -->|within window, count < cap| LRU_OK[delete + increment count + re-set\nLRU touch — no eviction needed]
    RC -->|within window, count >= cap| LRU_429[delete + re-set\nLRU touch — return 429]
    NEW --> AUTH[verifySignature: HMAC + 5min window]
    LRU_OK --> AUTH
    LRU_429 --> RESP429[429 Retry-After]
    AUTH -->|fail| LOG[logShouldEmit: coalesce dedup\ndelete + re-set + evictOldest if size > 10k]
    AUTH -->|pass| HANDLER[compact-owner / owner-storage]
    LOG --> RESP401[401 unauthorized]
    HANDLER --> DBOP[SQLite DELETE/UPDATE or SELECT]
    DBOP --> RESP200[200 JSON response]
    SWEEP[setInterval 30s sweep] -.->|delete expired| RC
    SWEEP -.->|emit coalesced warn + delete| LOG
Loading

Comments Outside Diff (1)

  1. src/lib/self-server.ts, line 31-33 (link)

    P2 Stale file-header comment contradicts the PR's security change

    The module-level comment still says "This server binds to 0.0.0.0 by default because it's intended to be reachable from outside the host" — but this PR's entire point is flipping that default to 127.0.0.1. A self-hoster reading the source to understand the security model will get the wrong picture.

Reviews (1): Last reviewed commit: "chore: v1.2.2 — close production-readine..." | Re-trigger Greptile

Comment on lines +193 to +210
body: JSON.stringify({ ownerId: ownerIdStr, timestamp: ts, signature: sig }),
});
codes.push(r.status);
}
// First 10 hit the bucket → 200 (no msgs to delete, but auth+route OK).
// Last 2 → 429.
// NOTE: the previous test in this suite already burned 1 token, so
// we expect first 9 of THIS test to be 200, then 429s. Between the
// earlier idempotent test, the storage probes don't count (different
// bucket), so compact has burned 1+1=2 by here. So 8 of THIS test's
// 12 should be 200, and the rest 429.
// To keep this deterministic, just assert: at least one 429 fired,
// and all non-429s are 200.
const success = codes.filter(c => c === 200).length;
const limited = codes.filter(c => c === 429).length;
assert.ok(limited >= 1, `expected at least one 429, got codes ${codes.join(",")}`);
assert.ok(success + limited === 12, "every request returned 200 or 429");
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Rate-limit test has an implicit order dependency

The test's correctness relies on exactly two compact calls having been made in prior tests. The comment acknowledges this, but the assertion limited >= 1 means the test still passes even if the bucket resets mid-suite (e.g. due to test parallelism or a future reorder). A self-contained approach — spinning up a fresh createSelfServer inside this test and burning a known number of tokens — would make the assertion strong enough to actually catch a regression in the eviction/rate-limit path.

@elkimek elkimek merged commit 70f2d3d into main May 5, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant