Skip to content

fix(trie): generational GC to end the prune/commit missing-node race#798

Merged
github-actions[bot] merged 1 commit into
mainfrom
fix/trie-generational-gc
Jun 6, 2026
Merged

fix(trie): generational GC to end the prune/commit missing-node race#798
github-actions[bot] merged 1 commit into
mainfrom
fix/trie-generational-gc

Conversation

@satyakwok
Copy link
Copy Markdown
Member

@satyakwok satyakwok commented Jun 6, 2026

Problem

The periodic trie prune (maybe_prune_trie, every 5000 blocks) runs on a background thread whose live-set is a snapshot frozen at spawn time. Blocks committed during the multi-minute GC walk write new nodes/values the snapshot never saw → the old gc_table deleted them as "orphans". #791 split the node/value passes + re-augmented the live-set to narrow the window, but (per its own comment) the complete fix is "one RW txn around walk+delete".

The window persisted: testnet val3 hit trie: missing node in create_block pre-apply, so every time it was BFT proposer the round ate a 20s propose timeout → chain crawled to 0.13 blk/s until a restart reloaded the trie from MDBX. Recurring across the fleet.

Fix — generational (deferred) GC

Instead of deleting an orphan immediately, the prune tombstones it (TABLE_TRIE_TOMBSTONES, keyed disc‖hash → version) and only deletes it on a LATER prune if it's still orphan then.

A hash committed during this prune is orphan vs the frozen snapshot → tombstoned this cycle — but next cycle it's a recent live node → its tombstone is dropped instead of deleting it. Race closed. Worst-case failure mode is benign: under-deletion (storage grows one extra cycle), never deletion of a live entry. Deletion lags one prune interval (~5000 blocks).

Safety

  • No consensus/state change — trie content + state_root identical; only dead-node deletion timing moves. No fork gate needed.
  • New tombstone table auto-creates on env open (ALL_TABLES) → existing chain.dbs need no migration.
  • Not the eventual single-RW-txn fix, but race-free without a 10–20 min write lock (which would stall the chain).

Tests

  • test_generational_gc_defers_then_reaps — orphan survives cycle 1, reaped cycle 2.
  • test_generational_gc_spares_node_committed_during_prune — orphan-this-cycle / live-next-cycle is never deleted (the race repro; fails against the old gc_table).
  • Updated 3 tree prune tests to the deferred contract (tombstone → advance version → reap).

cargo test -p sentrix-trie: 82 passed. cargo check --workspace -D warnings clean.

Summary by CodeRabbit

  • Chores

    • Version bumped to 2.2.38
  • Bug Fixes

    • Enhanced storage cleanup mechanisms with improved reliability to prevent race conditions during garbage collection cycles, ensuring safer data persistence across multiple maintenance operations.

The periodic trie prune runs on a background thread whose live-set is a
snapshot frozen at spawn time. Blocks committed during the multi-minute GC
walk write new nodes/values the snapshot never saw; the old gc deleted them
as "orphans". #791 split the node/value passes and re-augmented the live-set
to narrow the window, but its own comment notes the complete fix is still
"one RW txn around walk+delete". The window persisted: testnet val3 hit a
"missing node" mid-traversal in create_block, so every time it was proposer
the round ate a 20s timeout and the chain crawled to 0.13 blk/s until a
restart reloaded the trie.

Generational GC: instead of deleting an orphan immediately, the prune
TOMBSTONES it (TABLE_TRIE_TOMBSTONES, keyed `disc||hash` -> version) and only
deletes it on a LATER prune if it is STILL orphan then. A hash committed
during this prune is orphan vs the snapshot, so it's tombstoned this cycle —
but next cycle it's a recent live node, so its tombstone is dropped instead
of deleting it. The race is closed; the worst-case failure mode is benign
(under-deletion = storage grows one extra cycle, never deletion of a live
entry). Deletion lags one prune interval (~5000 blocks).

The new tombstone table auto-creates on env open (ALL_TABLES), so existing
chain.dbs need no migration. No consensus/state change: trie content and
state_root are identical; only dead-node deletion TIMING moves. Not the
complete single-RW-txn fix, but race-free without holding a 10-20 min write
lock (which would stall the chain).

Tests: gc_table_generational defers-then-reaps; a node orphan-this-cycle /
live-next-cycle is never deleted (the race repro, fails against the old
gc_table). Updated three tree prune tests to the deferred contract
(tombstone, advance version, reap).
@github-actions github-actions Bot enabled auto-merge (squash) June 6, 2026 08:02
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 6, 2026

Codecov Report

❌ Patch coverage is 99.28058% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/sentrix-trie/src/storage.rs 99.14% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 6, 2026

Wondering what really moved? Review this PR in Change Stack to inspect semantic changes, definitions, and references.

Review Change Stack

📝 Walkthrough

Walkthrough

This PR implements a generational garbage collection mechanism for the Sentrix trie to eliminate race conditions during node and value deletion. Instead of immediately removing orphaned entries, the system now defers deletion across two prune cycles using a tombstone table. The first prune marks orphans as tombstones at the current version; the second prune deletes stale tombstones if the entries remain orphaned. The prune() method now calls gc_nodes_generational, augments the live set to the latest on-disk state, and calls gc_values_generational. Existing tests are updated to account for the one-cycle deferral by running prune() twice with an intervening commit() to advance the version counter.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • sentrix-labs/sentrix#791: The main PR builds on the node-then-value race fix introduced in this PR by implementing the generational tombstone mechanism that complements the augmented live-set approach.
  • sentrix-labs/sentrix#584: Background pruning dispatches to the updated SentrixTrie::prune() method, so concurrent commits now interact with the new generational GC logic.
  • sentrix-labs/sentrix#711: Both PRs modify the prune race-avoidance strategy; this PR uses generational tombstones while the retrieved PR augments live roots from the latest on-disk version.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change—implementing generational GC to fix the prune/commit race condition affecting trie node deletion.
Description check ✅ Passed The description is comprehensive and addresses the template's key sections: Problem, Fix, Safety, and Tests; however, the Scope and Checks sections of the template are not filled out with checkboxes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/trie-generational-gc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/sentrix-trie/src/storage.rs`:
- Around line 404-436: The current reap loop (using tombstoned/live_hashes ->
reap -> later deletes via self.mdbx.delete on tables::TABLE_TRIE_TOMBSTONES and
data_table) can delete hashes that were reintroduced by a concurrent writer; fix
by rechecking liveness under a writer-stable view immediately before performing
deletes: just before iterating clear/reap to call self.mdbx.delete, obtain a
writer-locked/stable snapshot (or re-run the equivalent live_hashes check under
transactional/lock semantics provided by mdbx) and filter the reap list against
that fresh snapshot (i.e., recompute whether each h in reap is still absent from
live_hashes), only then perform the deletes on TABLE_TRIE_TOMBSTONES and
data_table; alternatively, defer deletes of prior-cycle tombstones until they
can be executed inside the same writer-synchronized transaction that confirms
absence.
- Around line 413-415: The current logic maps any non-8-byte tombstone payload
to tv = 0 via v.try_into().map(u64::from_be_bytes).unwrap_or(0), which silently
makes malformed tombstones immediately reapable; instead, detect malformed
tombstone payloads and surface a storage-corruption error to stop GC. Replace
the unwrap_or(0) path with error handling that returns or propagates a
StorageCorruption (or appropriate Result::Err) from the enclosing function (the
prune/scan routine that owns variables tv, v, version, reap, and h), include
contextual info (e.g., key/header h and payload length) in the error, and only
push h into reap when the tombstone value successfully parses and tv < version.
Ensure callers handle the Result so GC aborts on corrupt tombstone payloads.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: d3a6f22c-215a-4c22-9bd1-96cb1eb4461a

📥 Commits

Reviewing files that changed from the base of the PR and between 10b4874 and 71025c6.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock, !**/*.lock
📒 Files selected for processing (4)
  • Cargo.toml
  • crates/sentrix-storage/src/tables.rs
  • crates/sentrix-trie/src/storage.rs
  • crates/sentrix-trie/src/tree.rs

Comment on lines +404 to +436
self.mdbx
.iter_from(tables::TABLE_TRIE_TOMBSTONES, &[], |k, v| {
if k.len() == 33 && k[0] == disc {
let mut h = [0u8; 32];
h.copy_from_slice(&k[1..]);
tombstoned.insert(h);
if live_hashes.contains(&h) {
clear.push(h);
} else {
let tv = v.try_into().map(u64::from_be_bytes).unwrap_or(0);
if tv < version {
reap.push(h);
}
}
}
true
})
.map_err(|e| SentrixError::StorageError(e.to_string()))?;

let mut tomb_key = [0u8; 33];
tomb_key[0] = disc;
for h in clear.iter().chain(reap.iter()) {
tomb_key[1..].copy_from_slice(h);
self.mdbx
.delete(tables::TABLE_TRIE_TOMBSTONES, &tomb_key)
.map_err(|e| SentrixError::StorageError(e.to_string()))?;
}
let deleted = reap.len();
for h in &reap {
self.mdbx
.delete(data_table, h)
.map_err(|e| SentrixError::StorageError(e.to_string()))?;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Reaping still races with a same-hash commit.

reap is decided from the frozen live_hashes snapshot and then deleted later in separate write transactions. If a concurrent commit reintroduces the same content-addressed node/value hash after Lines 410-415 but before Line 434 runs, this cycle still deletes it and the newly committed root now points at a missing node/value. The one-cycle tombstone deferral only protects hashes that become live before the next prune starts; it does not protect resurrection during the reaping cycle itself. Please re-check liveness against a writer-stable view immediately before delete, or keep prior-cycle tombstones deferred until reap is coupled to writer synchronization. As per coding guidelines, crates/sentrix-trie/**: CONSENSUS-CRITICAL — suggestions only, no destructive rewrites.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sentrix-trie/src/storage.rs` around lines 404 - 436, The current reap
loop (using tombstoned/live_hashes -> reap -> later deletes via self.mdbx.delete
on tables::TABLE_TRIE_TOMBSTONES and data_table) can delete hashes that were
reintroduced by a concurrent writer; fix by rechecking liveness under a
writer-stable view immediately before performing deletes: just before iterating
clear/reap to call self.mdbx.delete, obtain a writer-locked/stable snapshot (or
re-run the equivalent live_hashes check under transactional/lock semantics
provided by mdbx) and filter the reap list against that fresh snapshot (i.e.,
recompute whether each h in reap is still absent from live_hashes), only then
perform the deletes on TABLE_TRIE_TOMBSTONES and data_table; alternatively,
defer deletes of prior-cycle tombstones until they can be executed inside the
same writer-synchronized transaction that confirms absence.

Source: Coding guidelines

Comment on lines +413 to +415
let tv = v.try_into().map(u64::from_be_bytes).unwrap_or(0);
if tv < version {
reap.push(h);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail closed on malformed tombstone payloads.

Line 413 turns any non-8-byte tombstone value into tv = 0, which makes that entry immediately reapable on the next prune. In this path, corrupt tombstone state should stop GC, not silently authorize deletion of trie data. Please surface a storage-corruption error instead of defaulting. As per coding guidelines, crates/sentrix-trie/**: CONSENSUS-CRITICAL — suggestions only, no destructive rewrites.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sentrix-trie/src/storage.rs` around lines 413 - 415, The current logic
maps any non-8-byte tombstone payload to tv = 0 via
v.try_into().map(u64::from_be_bytes).unwrap_or(0), which silently makes
malformed tombstones immediately reapable; instead, detect malformed tombstone
payloads and surface a storage-corruption error to stop GC. Replace the
unwrap_or(0) path with error handling that returns or propagates a
StorageCorruption (or appropriate Result::Err) from the enclosing function (the
prune/scan routine that owns variables tv, v, version, reap, and h), include
contextual info (e.g., key/header h and payload length) in the error, and only
push h into reap when the tombstone value successfully parses and tv < version.
Ensure callers handle the Result so GC aborts on corrupt tombstone payloads.

Source: Coding guidelines

@github-actions github-actions Bot merged commit d93bbfe into main Jun 6, 2026
20 of 21 checks passed
github-actions Bot pushed a commit that referenced this pull request Jun 6, 2026
…residual (#799)

Follow-up to #798 addressing two CodeRabbit findings on the generational GC:

- Major: a non-8-byte tombstone payload mapped to tv=0, making a corrupt
  entry immediately reapable (could delete live trie data). Now fail closed —
  skip reaping malformed tombstones (keep the entry) and warn, instead of
  defaulting to 0.
- Critical (documentation): the reap still has a narrow content-addressed
  resurrection window inside its own scan+delete. Documented the caller
  contract (tree.rs re-augments `live` to the latest committed root
  immediately before each gc pass, collapsing the window from the old
  multi-minute walk to this method's ms-scale scan+delete) and that the
  complete elimination needs writer-coupling (walk+delete in one RW txn) —
  the tracked fix this PR-series deliberately defers to avoid a chain-blocking
  write lock. This series strictly narrows the race; verify_integrity remains
  the backstop.

Regression test: a malformed tombstone must not authorise deletion
(data survives). sentrix-trie: 83 passed.
@satyakwok satyakwok deleted the fix/trie-generational-gc branch June 6, 2026 09:50
github-actions Bot pushed a commit that referenced this pull request Jun 6, 2026
…800)

* fix(trie): race-free offline prune; background prune off by default

The durable fix for the recurring trie "missing node" stalls. Root cause
(confirmed): the BACKGROUND prune runs on a thread cloning the trie at a
frozen version while block apply keeps committing; a node committed or
content-addressed-resurfaced during the multi-minute live-set walk is absent
from the frozen snapshot and gets deleted as an orphan. Five partial fixes
(#711 reload-before-gc, #714 collect_reachable depth, #791 split passes,
#798 generational defer) each narrowed but never closed the window — #798
deleted live nodes again in production (val3, h=6300000 prune → "missing
node" → propose stall). A truly race-free background prune needs walk+delete
in one MDBX RW txn (a chain-blocking write lock for the 10-20 min walk) or
refcounting — both deferred.

This takes the mechanism-agnostic safe route: eliminate the concurrency.

- `SentrixTrie::prune_offline(keep)` — same walk + keep-window as `prune`,
  minus the racy augment and the generational deferral, using the combined
  immediate `gc_orphaned_nodes`. Correct only with no concurrent commits,
  which is guaranteed by running it on a STOPPED node.
- `sentrix chain prune [--keep N]` — operator runs it during a maintenance
  halt (same model as `chain reset-trie` / `verify-deep`). Safe on a single
  peer: deleting unreachable nodes does not change the state_root (which only
  commits reachable nodes), so no fork risk — unlike reset-trie.
- Background prune is now OFF by default. It only runs with
  SENTRIX_ENABLE_BACKGROUND_TRIE_PRUNE=1 (and the legacy
  SENTRIX_DISABLE_TRIE_PRUNE=1 still force-disables). maybe_prune_trie
  early-returns by default, so the apply path no longer even spawns the
  racy thread.

Trade-off: storage grows between maintenance prunes. Acceptable; correctness
over convenience after five automatic-prune failures.

Tests: prune_offline reclaims orphans AND the current value survives the
prune (the regression guard #798 lacked — a live-node deletion would make
the post-prune `get` fail with "missing node"); background_prune_enabled
gate is off by default + opt-in semantics. cargo test -p sentrix-trie: 84
passed; -p sentrix-core: 263 passed; cargo check --workspace -D warnings clean.

* fix(trie): clippy nonminimal-bool + chain prune fail-closed guards (CodeRabbit)

- Clippy: background_prune_enabled uses is_none_or instead of !..is_some_and
  (nonminimal_bool, -D warnings in CI; cargo check didn't catch it).
- chain prune Major guards (CodeRabbit on #800):
  - Refuse if no persisted trie root at the current height — never let the
    maintenance prune trigger an init_trie backfill rebuild (different node
    shape = reset-trie fork class). New Storage::has_persisted_trie_root.
  - Enforce the offline precondition instead of only printing it: detect a
    running node via height-stability (sample across >5s poll-persist
    interval) and fail closed if the chain advanced; override with
    SENTRIX_ALLOW_ONLINE_PRUNE=1 for rare recovery.
github-actions Bot pushed a commit that referenced this pull request Jun 6, 2026
…802)

test_c03_pass2_failure_rolls_back_state credits v1 to u64::MAX - reward and
relies on the coinbase credit overflowing in Pass 2. It read get_block_reward()
(which depends on the reward-fork env vars: VOYAGER_REWARD_V2_HEIGHT /
TOKENOMICS_V2_HEIGHT / halving) WITHOUT holding env_test_lock, while sibling
tests set those vars under that lock. Under the parallel `cargo test` run (e.g.
the report-only coverage job) a concurrent mutation changed `reward` mid-test,
the overflow didn't trigger, add_block returned Ok, and unwrap_err panicked —
the recurring CI flake that needed re-runs on #795/#798/#800/#801.

Acquire env_test_lock at the top of the test so it serializes against the
env-mutating tests. Test-only change. sentrix-core suite + clippy --all-targets
-D warnings clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant