Skip to content

Guard KV cache against page-cache pressure#134

Open
Ghatage wants to merge 1 commit into
antirez:mainfrom
Ghatage:guardPageCache
Open

Guard KV cache against page-cache pressure#134
Ghatage wants to merge 1 commit into
antirez:mainfrom
Ghatage:guardPageCache

Conversation

@Ghatage
Copy link
Copy Markdown
Contributor

@Ghatage Ghatage commented May 14, 2026

The disk KV cache uses plain read/write to avoid mapping more VM into a process that already maps a large GGUF, but the bytes still land in the Linux page cache where they compete with the mmapped weights for resident memory. A 30k-token cold save leaves hundreds of MiB sitting in Cached: after the save returns, exactly the RAM pressure the no-mmap decision was meant to avoid.

Hint the kernel to invalidate those pages with
posix_fadvise(POSIX_FADV_DONTNEED) right after each full payload write and read. Header-only scans are deliberately untouched, since repeated small header reads still benefit from page-cache reuse. Expose DS4_KV_KEEP_PAGES=1 as an escape hatch for diagnostic comparisons, mirroring the existing cuda_model_drop_file_pages and its DS4_CUDA_KEEP_MODEL_PAGES toggle.

Correctness: make test; ./ds4_test --server.
Two new mincore-based tests assert that resident pages of a 4 MiB temp file drop from ~all to under 25% after the hint, and that DS4_KV_KEEP_PAGES=1 keeps them in place.

The disk KV cache uses plain read/write to avoid mapping more VM
into a process that already maps a large GGUF, but the bytes still
land in the Linux page cache where they compete with the mmapped
weights for resident memory. A 30k-token cold save leaves hundreds
of MiB sitting in Cached: after the save returns, exactly the RAM
pressure the no-mmap decision was meant to avoid.

Hint the kernel to invalidate those pages with
posix_fadvise(POSIX_FADV_DONTNEED) right after each full payload
write and read. Header-only scans are deliberately untouched,
since repeated small header reads still benefit from page-cache
reuse. Expose DS4_KV_KEEP_PAGES=1 as an escape hatch for diagnostic
comparisons, mirroring the existing cuda_model_drop_file_pages and
its DS4_CUDA_KEEP_MODEL_PAGES toggle.

Correctness: make test; ./ds4_test --server.
Two new mincore-based tests assert that resident pages of a 4 MiB
temp file drop from ~all to under 25% after the hint, and that
DS4_KV_KEEP_PAGES=1 keeps them in place.
@antirez
Copy link
Copy Markdown
Owner

antirez commented May 14, 2026

Like this. Will take care.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants