Skip to content

Fix eviction queue dead_nodes counter underflow causing excessive purge iterations#40

Draft
krleonid wants to merge 1 commit into
mainfrom
fix/eviction-queue-dead-nodes-underflow
Draft

Fix eviction queue dead_nodes counter underflow causing excessive purge iterations#40
krleonid wants to merge 1 commit into
mainfrom
fix/eviction-queue-dead-nodes-underflow

Conversation

@krleonid
Copy link
Copy Markdown
Owner

Summary

  • Change total_dead_nodes from atomic<idx_t> (unsigned) to atomic<int64_t> (signed) to prevent underflow
  • Fix Purge() loop early-out condition to treat negative dead_nodes as zero
  • Remove unnecessary UnsafeNumericCast in duckdb_eviction_queues output

Problem

The total_dead_nodes counter underflows because DecrementDeadNodes() is called for nodes that were never counted by IncrementDeadNodes() — specifically, nodes that die (weak_ptr expires or block is pinned) without ever being superseded by a newer version.

When the unsigned counter wraps to a massive value, the Purge() loop's condition 2.2 never triggers:

idx_t approx_dead_nodes = total_dead_nodes;  // underflowed to huge unsigned value
approx_dead_nodes = approx_dead_nodes > approx_q_size ? approx_q_size : approx_dead_nodes;
// ^ clamped to approx_q_size

idx_t approx_alive_nodes = approx_q_size - approx_dead_nodes;
// ^ = approx_q_size - approx_q_size = 0

if (approx_alive_nodes * (ALIVE_NODE_MULTIPLIER - 1) > approx_dead_nodes)
// ^ 0 * 3 > approx_dead_nodes → always false! Loop never exits via condition 2.2

This causes the purge loop to run all max_purges iterations (up to approx_q_size / 8192) unnecessarily, holding purge_lock the entire time.

Observed output from duckdb_eviction_queues

{
    "data": [
        {"approximate_size": 5098963, "dead_nodes": 3873616, "queue_index": 0, "queue_type": "BLOCK_AND_EXTERNAL_FILE", "total_insertions": 605198908},
        {"approximate_size": 224455537, "dead_nodes": 163476417, "queue_index": 1, "queue_type": "MANAGED_BUFFER", "total_insertions": 1977873276},
        {"approximate_size": 29715, "dead_nodes": -107922767, "queue_index": 2, "queue_type": "MANAGED_BUFFER", "total_insertions": 108300237},
        {"approximate_size": 32337, "dead_nodes": -134953305, "queue_index": 3, "queue_type": "MANAGED_BUFFER", "total_insertions": 135275557},
        {"approximate_size": 30044, "dead_nodes": -73130179, "queue_index": 4, "queue_type": "MANAGED_BUFFER", "total_insertions": 73253127},
        {"approximate_size": 28361, "dead_nodes": -32992668, "queue_index": 5, "queue_type": "MANAGED_BUFFER", "total_insertions": 33069332},
        {"approximate_size": 33218, "dead_nodes": -171896118, "queue_index": 6, "queue_type": "MANAGED_BUFFER", "total_insertions": 172673075},
        {"approximate_size": 0, "dead_nodes": 0, "queue_index": 7, "queue_type": "TINY_BUFFER", "total_insertions": 0}
    ]
}

Queues 2-6 show negative dead_nodes — these are the underflowed counters causing excessive purge iterations on every trigger.

Test plan

  • All eviction queue tests pass (*eviction* — 10 test cases, 4095 assertions)
  • Buffer storage tests pass (test/sql/storage/buffer* — 4 test cases)
  • Parallelism tests pass (test/sql/parallelism/* — 68 test cases, 412018 assertions)
  • CI full test suite

…er underflow

When total_dead_nodes underflows (decremented more than incremented due to
accounting mismatch), the wrapped unsigned value is always greater than
approx_q_size. The old code clamped it to approx_q_size, making
approx_alive_nodes=0, which prevented condition 2.2 from ever triggering.
This caused the loop to run all max_purges iterations unnecessarily.

Fix: if approx_dead_nodes > approx_q_size, break immediately — the counter
is unreliable and there is no meaningful dead node pressure to address.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@krleonid krleonid force-pushed the fix/eviction-queue-dead-nodes-underflow branch from 60dc284 to 1d1cec7 Compare May 10, 2026 09:17
@krleonid krleonid marked this pull request as draft May 12, 2026 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant