Disk KV cache: stores then immediately evicts the same file when budget is full

When the disk KV cache is at its budget, a single chat request writes a 291 MiB snapshot to disk and `unlink`s it on the very next operation, twice in a row. The same SHA gets stored and evicted three times across one request.

```
23:33:08 kv cache evicted reason=disk-cache-full tokens=20480 hits=0 size=291.80 MiB file=.../7400e69e....kv
23:33:08 chat ctx=0..23029:23029 TOOLS prompt start
...
23:34:06 chunk 20480/23029 (88.9%)
23:34:06 kv cache stored tokens=20480 trimmed=0 reason=continued size=291.80 MiB save=42.6 ms
23:34:06 kv cache evicted reason=disk-cache-full tokens=20480 hits=0 size=291.80 MiB file=.../7400e69e....kv
23:34:06 kv cache stored tokens=20480 trimmed=2549 reason=cold size=291.80 MiB save=55.2 ms
23:34:06 kv cache evicted reason=disk-cache-full tokens=20480 hits=0 size=291.80 MiB file=.../7400e69e....kv
```

The `7400e69e` file is stored, evicted, stored again, evicted again, all within the same second. Net useful work: zero. Wasted I/O: ~98 ms (`save=42.6` + `save=55.2`) and 583 MiB written.

## Why it happens

Two independent issues compound. Either alone is mild; together they thrash.

**1. Two stores at the same prefix length.** During the prefill at `ds4_server.c:10354-10372` the cold path syncs to `cold_store_len`, which during the sync triggers the continued-interval callback at `ds4_server.c:9892` and stores the snapshot once with reason `continued`; then `ds4_server.c:10368` stores the same prefix again with reason `cold`. Same tokens, same rendered text, same SHA, two writes. This fires whenever `cold_store_len` lands on a multiple of the continued step (default 10240 after alignment), which is most prompts above ~10 k tokens.

**2. Eviction picks the file that was just written.** Every successful store calls `kv_cache_evict` at `ds4_server.c:9010`. The score at `ds4_server.c:8627-8639` is `(hits + 1) * tokens / file_size`. For `hits == 0` entries this reduces to roughly `1 / bytes_per_token`, so a just-written entry has no advantage over older `hits == 0` entries. The `last_used` tiebreaker only fires on exact float equality. In practice the just-written file is the largest low-score entry, so evicting it alone satisfies the budget in one iteration, and the loop picks it.

The comment at `ds4_server.c:8636` says `hits + 1` exists so a fresh checkpoint is not deleted because its hit counter is 0. The `+ 1` only defends against multiply-by-zero; it does not rank a fresh entry above other fresh entries.

## How to reproduce

1. Fill the `kv-cache` directory close to budget (a handful of chats, or set `--kv-cache-budget-mb` low for testing).
2. Send any chat whose prompt tokens are a multiple of the continued step (10240 with defaults), or fall just above one.
3. The `stored ... reason=continued` / `evicted ... <same hash>` / `stored ... reason=cold` / `evicted ... <same hash>` pattern appears in the log.

The command line I was using:
```./ds4-server --ctx 512000 --kv-disk-dir ./kv-cache --kv-cache-boundary-align-tokens 2048 --kv-cache-min-tokens 512 --kv-disk-space-mb 8192 --mtp gguf/DeepSeek-V4-Flash-MTP-Q4K-Q8_0-F32.gguf --mtp-draft 2 --host 0.0.0.0 --port 8085 --kv-cache-boundary-trim-tokens 1000```

## Smallest fix I can see

Either of these on its own removes most of the waste:

- In `kv_cache_evict`, exclude the file the current store just wrote from the candidate set. If the budget cannot be satisfied without it, return that fact and let the store path skip the write.
- Before the cold sync at `ds4_server.c:10354-10372`, set `kc->continued_last_store_tokens = cold_store_len` so the in-prefill callback does not also fire at that boundary.

Happy to send a patch if useful.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disk KV cache: stores then immediately evicts the same file when budget is full #157

Why it happens

How to reproduce

Smallest fix I can see

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Disk KV cache: stores then immediately evicts the same file when budget is full #157

Description

Why it happens

How to reproduce

Smallest fix I can see

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions