fix(kv-cache): guard never-hit entries from immediate self-eviction by audreyt · Pull Request #159 · antirez/ds4

audreyt · 2026-05-15T12:05:09Z

Summary

Fresh cold-saves at the start of a new conversation are smaller and less
token-dense per byte than mature multi-thousand-token snapshots. The
existing (hits + 1) * tokens / file_size eviction score correctly
captures the long-run "value per byte stored" ranking but has one
pathology under budget pressure: on a saturated cache the new cold-save
is the lowest-scoring entry the moment it is written, so the eviction
loop reaps it on the same wall-clock second — before its session's next
turn can issue its first lookup. The cold-save infrastructure
(cold-prefix boundary alignment, BPE boundary trim via
--kv-cache-boundary-trim-tokens, SHA-keyed text lookup on subsequent
requests) is then effectively dead under any real workload that uses
the default 4 GB budget.

This PR adds a narrow guard: a never-hit entry whose last_used is
within KV_EVICTION_FRESH_GUARD_SECONDS (60s) gets a +INF eviction
score, so the eviction loop cannot pick it until either it ages out
without being hit, or it is hit at least once (after which the existing
(hits + 1) factor protects it).

Reproduction

Three-turn chat workload, ~2500-token initial prompt, default 8 GB
--kv-disk-space-mb saturated by old snapshots:

	turn 1 cold	turn 2	turn 3
before	7.1 s	6.5 s	6.5 s
after	7.1 s	2.1 s	2.0 s

Server log before, every turn:

kv cache stored tokens=2048 trimmed=… reason=cold size=49.78 MiB
kv cache evicted reason=disk-cache-full tokens=2048 hits=0 size=49.78 MiB

i.e. each cold-save is created and reaped in the same second.

Server log after:

kv cache hit text tokens=2048 text=… load=5-6 ms file=…

Turn 2+ now load the just-written 2048-token cold prefix from disk in
~6 ms and only prefill the tokens beyond it.

Why not a different fix

Three earlier candidates were considered and rejected on this code path:

(hits + K) * tokens / file_size with K=4: doesn't help. A
2048-token / 50 MB cold-save has ~39 t/B; a mature 90000-token / 1.3 GB
snapshot has ~71 t/B. The size-density gap is intrinsic (fixed-cost
header overhead amortizes better at larger sizes), and no bias on the
(hits + …) factor flips the ranking.
Capping hits at a small constant: same problem. With hits capped
at 4 and bias 4, the cold-save scores 156 while the mature entry scores
285. The dominant signal is tokens / file_size, not hits.
Time-decayed score everywhere: introduces exp() calls per entry
per eviction pass and a tuning constant for the decay timescale.
Strictly more code than the narrow guard, and changes the long-run
ranking among aged entries, which is the part the current formula gets
right.

Eviction loop change

When every entry is guard-protected (i.e. all victim_score == DBL_MAX),
the loop short-circuits with a break rather than evicting a guarded
entry. The cache then overruns budget transiently until guard windows
expire. That excursion is bounded by KV_EVICTION_FRESH_GUARD_SECONDS
and made explicit, rather than silently looping forever or evicting
through the guard.

Tests

New: test_kv_cache_eviction_guards_just_written_never_hit_entry —
reproduces the failure mode. An old high-density entry and a fresh
low-density entry compete for a one-entry budget. Under the old score
the fresh entry would be evicted; under the guard the fresh entry
survives and the old entry is evicted.
Existing: test_kv_cache_eviction_values_fresh_snapshots and
test_kv_cache_eviction_keeps_aligned_continued_frontiers continue to
pass — they use ancient last_used values (100, 200, 300) so the
guard window never applies and scoring reverts to the original formula.

Test plan

make ds4_test && ./ds4_test passes locally (the three eviction
tests in particular).
A multi-turn chat-completions workload against a budget-saturated
cache shows turn-2 disk-text hits in the server log instead of
kv cache evicted reason=disk-cache-full of the just-written
cold-save.
Default --kv-disk-space-mb 4096 workloads benefit too (the
pathology is worst at smaller budgets).

🤖 Generated with Claude Code

A fresh cold-save written at the start of a new conversation is, by construction, smaller and less token-dense per byte than a mature multi-thousand-token snapshot from a long-running session. The existing `(hits + 1) * tokens / file_size` eviction score correctly captures the long-run "value per byte stored" ranking, but it has one pathology under budget pressure: on a saturated cache the new cold-save is the lowest-scoring entry the moment it is written, so the eviction loop reaps it on the same wall-clock second before its session's next turn can issue its first lookup. The cold-save infrastructure (cold-prefix boundary alignment, BPE boundary trim, SHA-keyed text lookup on subsequent requests) is then effectively dead under any real workload that uses the default 4 GB budget. Observed before this change on a multi-turn workload at the default budget pressure (2500-token initial prompt + 3 small turns): turn 1 cold: 7.1s turn 2: 6.5s turn 3: 6.5s Log on every turn: kv cache stored tokens=2048 trimmed=… reason=cold size=49.78 MiB kv cache evicted reason=disk-cache-full tokens=2048 hits=0 size=49.78 MiB i.e. each cold-save is created and reaped in the same second. Turn 2 falls back to a full cold prefill. This commit adds a narrow guard. A never-hit entry whose last_used is within KV_EVICTION_FRESH_GUARD_SECONDS (60s) gets a +INF eviction score, so the eviction loop cannot pick it until either: 1. it ages past the guard window without ever being hit, in which case it returns to normal scoring and is freely evictable; or 2. it is hit at least once, in which case the existing (hits + 1) factor already protects it. The eviction loop additionally short-circuits when victim_score == DBL_MAX: if every entry in the cache is guard-protected, no entry is evicted and the cache overruns budget transiently until the guard window expires. That excursion is bounded by the guard duration and made explicit rather than silently looping forever or evicting a guarded entry. After this change on the same workload at the same 8 GB budget that was already saturated by old snapshots: turn 1 cold: 7.1s turn 2: 2.1s turn 3: 2.0s (3.2x speedup) Log on every turn: kv cache hit text tokens=2048 text=… load=5-6 ms file=… Turns 2+ now load the just-written cold-save's 2048-token prefix from disk in ~6 ms and only prefill the new tokens beyond it, instead of re-running the full prefill from token zero. The new test test_kv_cache_eviction_guards_just_written_never_hit_entry reproduces the failure mode: an old high-density entry and a fresh low-density entry compete for a one-entry budget. Under the old score the fresh entry would be evicted; under the guard the fresh entry survives and the old entry is evicted, matching the empirical fix. The two existing eviction tests (values_fresh_snapshots, keeps_aligned_continued_frontiers) continue to pass: they use ancient last_used timestamps (100, 200, 300) so the guard window never applies and the score reverts to the original formula. Honest scope statement: this fixes single-session multi-turn under budget pressure where fresh cold-saves would otherwise self-evict before their first lookup. Concurrent-session pathologies (many guard-protected entries piling up at once, briefly exceeding budget until their windows expire) are bounded by the guard duration and accepted as a transient over-budget excursion, not an unbounded leak. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR against antirez/ds4: antirez#159 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ensureSupportCheckout previously cloned audreyt/ds4 main with --depth 1 on first install and never touched the checkout again. ensureBuilt likewise built ds4-server once and never rebuilt. That meant once a user had a working pi-ds4 install, audreyt/ds4 commits we shipped afterward (e.g. the KV-cache fresh-cold-save guard at 484f37a) could not reach them without a manual `rm -rf ~/.pi/ds4/support`. Add SUPPORT_PIN — the exact audreyt/ds4 commit each pi-ds4 release was tested against. On every launch, syncToSupportPin: - reads HEAD with `git rev-parse`, - if it doesn't match SUPPORT_PIN, fetches the pin and hard-resets to it, - deletes the cached ds4-server binary so ensureBuilt rebuilds against the new source. Three guard rails so this can't surprise users: 1. Symlinked SUPPORT_DIR (dev workflow pointing at a working tree like /Users/au/w/ds4) is detected via lstat and skipped entirely. 2. A working tree with uncommitted changes (git status --porcelain non-empty) refuses the reset and surfaces a clear error pointing at DS4_SUPPORT_PIN= as the disable escape hatch. 3. DS4_SUPPORT_REPO or DS4_SUPPORT_BRANCH set in env disables the pin — the user has explicitly opted into a different upstream. Bump cadence: every pi-ds4 release that wants existing users to pick up a new ds4-server fix bumps SUPPORT_PIN to the audreyt/ds4 commit being shipped. The pin lives next to SUPPORT_REPO/BRANCH so the relationship is visible at the top of the file. This commit ships SUPPORT_PIN=484f37a66dd4b13dc2c968674eb94821109a7c4f, the audreyt/ds4 main that includes the KV-cache fresh-cold-save guard (antirez/ds4#159, merged locally as 484f37a). Existing pi-ds4 installs will pay one git fetch + make ds4-server on their next launch and then their multi-turn turn-2 latency drops from ~6.5s to ~2s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

antirez · 2026-05-15T18:32:27Z

Already fixed when I saw the issue. Sorry!

audreyt added a commit to audreyt/ds4 that referenced this pull request May 15, 2026

Merge branch 'feat/kv-cache-guard-fresh-cold-saves'

484f37a

PR against antirez/ds4: antirez#159 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

antirez closed this May 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(kv-cache): guard never-hit entries from immediate self-eviction#159

fix(kv-cache): guard never-hit entries from immediate self-eviction#159
audreyt wants to merge 1 commit into
antirez:mainfrom
audreyt:feat/kv-cache-guard-fresh-cold-saves

audreyt commented May 15, 2026

Uh oh!

antirez commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

audreyt commented May 15, 2026

Summary

Reproduction

Why not a different fix

Eviction loop change

Tests

Test plan

Uh oh!

antirez commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants