Skip to content

Defer eviction of checkpoint blocks to reduce post-checkpoint cold reads#55

Open
krleonid wants to merge 1 commit into
mainfrom
feature/defer-eviction-during-checkpoint
Open

Defer eviction of checkpoint blocks to reduce post-checkpoint cold reads#55
krleonid wants to merge 1 commit into
mainfrom
feature/defer-eviction-during-checkpoint

Conversation

@krleonid
Copy link
Copy Markdown
Owner

Summary

  • During checkpoint, newly-written persistent blocks are pinned in the buffer pool instead of being immediately added to the eviction queue
  • This prevents the checkpoint process from cannibalizing its own freshly-written blocks when allocating memory for subsequent writes
  • After checkpoint completes, all pins are released and blocks become eligible for normal LRU eviction

Problem

Each ConvertToPersistent call writes a block to disk, then adds it to the eviction queue. The next block write needs a buffer, so the pool evicts the oldest queued block — which is the one just written. This creates a cycle where checkpoint evicts its own output, forcing the first post-checkpoint query to re-read everything from disk.

Measured improvement

67MB dataset, 80MB memory limit:

  • Before: 23 blocks re-read from disk after checkpoint
  • After: 5 blocks re-read (78% reduction)

When data fully fits in memory, post-checkpoint cold reads drop to zero.

Test plan

  • Unit tests pass (build/reldebug/test/unittest)
  • Verify with SET memory_limit + checkpoint + query pattern
  • Verify no OOM when dataset exceeds memory limit (deferred pins are bounded by available memory)

🤖 Generated with Claude Code

During checkpoint, ConvertToPersistent writes blocks to disk and
immediately adds them to the eviction queue. Subsequent block writes
need memory, so the buffer pool evicts the freshly-written blocks to
reuse their buffers. This creates a self-inflicted cold cache: the
first query after checkpoint must re-read blocks that were just in
memory moments ago.

Fix: pin blocks during checkpoint writes so they cannot be evicted.
After all checkpoint data is flushed, release the pins. This ensures
that when data fits in the buffer pool, the first post-checkpoint
query finds all blocks already cached.

Measured improvement (67MB dataset, 80MB memory limit):
- Before: 23 blocks re-read from disk after checkpoint
- After: 5 blocks re-read (78% reduction)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant