fix(trie): generational GC to end the prune/commit missing-node race#798
Conversation
The periodic trie prune runs on a background thread whose live-set is a snapshot frozen at spawn time. Blocks committed during the multi-minute GC walk write new nodes/values the snapshot never saw; the old gc deleted them as "orphans". #791 split the node/value passes and re-augmented the live-set to narrow the window, but its own comment notes the complete fix is still "one RW txn around walk+delete". The window persisted: testnet val3 hit a "missing node" mid-traversal in create_block, so every time it was proposer the round ate a 20s timeout and the chain crawled to 0.13 blk/s until a restart reloaded the trie. Generational GC: instead of deleting an orphan immediately, the prune TOMBSTONES it (TABLE_TRIE_TOMBSTONES, keyed `disc||hash` -> version) and only deletes it on a LATER prune if it is STILL orphan then. A hash committed during this prune is orphan vs the snapshot, so it's tombstoned this cycle — but next cycle it's a recent live node, so its tombstone is dropped instead of deleting it. The race is closed; the worst-case failure mode is benign (under-deletion = storage grows one extra cycle, never deletion of a live entry). Deletion lags one prune interval (~5000 blocks). The new tombstone table auto-creates on env open (ALL_TABLES), so existing chain.dbs need no migration. No consensus/state change: trie content and state_root are identical; only dead-node deletion TIMING moves. Not the complete single-RW-txn fix, but race-free without holding a 10-20 min write lock (which would stall the chain). Tests: gc_table_generational defers-then-reaps; a node orphan-this-cycle / live-next-cycle is never deleted (the race repro, fails against the old gc_table). Updated three tree prune tests to the deferred contract (tombstone, advance version, reap).
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Wondering what really moved? Review this PR in Change Stack to inspect semantic changes, definitions, and references. 📝 WalkthroughWalkthroughThis PR implements a generational garbage collection mechanism for the Sentrix trie to eliminate race conditions during node and value deletion. Instead of immediately removing orphaned entries, the system now defers deletion across two prune cycles using a tombstone table. The first prune marks orphans as tombstones at the current version; the second prune deletes stale tombstones if the entries remain orphaned. The Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/sentrix-trie/src/storage.rs`:
- Around line 404-436: The current reap loop (using tombstoned/live_hashes ->
reap -> later deletes via self.mdbx.delete on tables::TABLE_TRIE_TOMBSTONES and
data_table) can delete hashes that were reintroduced by a concurrent writer; fix
by rechecking liveness under a writer-stable view immediately before performing
deletes: just before iterating clear/reap to call self.mdbx.delete, obtain a
writer-locked/stable snapshot (or re-run the equivalent live_hashes check under
transactional/lock semantics provided by mdbx) and filter the reap list against
that fresh snapshot (i.e., recompute whether each h in reap is still absent from
live_hashes), only then perform the deletes on TABLE_TRIE_TOMBSTONES and
data_table; alternatively, defer deletes of prior-cycle tombstones until they
can be executed inside the same writer-synchronized transaction that confirms
absence.
- Around line 413-415: The current logic maps any non-8-byte tombstone payload
to tv = 0 via v.try_into().map(u64::from_be_bytes).unwrap_or(0), which silently
makes malformed tombstones immediately reapable; instead, detect malformed
tombstone payloads and surface a storage-corruption error to stop GC. Replace
the unwrap_or(0) path with error handling that returns or propagates a
StorageCorruption (or appropriate Result::Err) from the enclosing function (the
prune/scan routine that owns variables tv, v, version, reap, and h), include
contextual info (e.g., key/header h and payload length) in the error, and only
push h into reap when the tombstone value successfully parses and tv < version.
Ensure callers handle the Result so GC aborts on corrupt tombstone payloads.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: d3a6f22c-215a-4c22-9bd1-96cb1eb4461a
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock,!**/*.lock
📒 Files selected for processing (4)
Cargo.tomlcrates/sentrix-storage/src/tables.rscrates/sentrix-trie/src/storage.rscrates/sentrix-trie/src/tree.rs
| self.mdbx | ||
| .iter_from(tables::TABLE_TRIE_TOMBSTONES, &[], |k, v| { | ||
| if k.len() == 33 && k[0] == disc { | ||
| let mut h = [0u8; 32]; | ||
| h.copy_from_slice(&k[1..]); | ||
| tombstoned.insert(h); | ||
| if live_hashes.contains(&h) { | ||
| clear.push(h); | ||
| } else { | ||
| let tv = v.try_into().map(u64::from_be_bytes).unwrap_or(0); | ||
| if tv < version { | ||
| reap.push(h); | ||
| } | ||
| } | ||
| } | ||
| true | ||
| }) | ||
| .map_err(|e| SentrixError::StorageError(e.to_string()))?; | ||
|
|
||
| let mut tomb_key = [0u8; 33]; | ||
| tomb_key[0] = disc; | ||
| for h in clear.iter().chain(reap.iter()) { | ||
| tomb_key[1..].copy_from_slice(h); | ||
| self.mdbx | ||
| .delete(tables::TABLE_TRIE_TOMBSTONES, &tomb_key) | ||
| .map_err(|e| SentrixError::StorageError(e.to_string()))?; | ||
| } | ||
| let deleted = reap.len(); | ||
| for h in &reap { | ||
| self.mdbx | ||
| .delete(data_table, h) | ||
| .map_err(|e| SentrixError::StorageError(e.to_string()))?; | ||
| } |
There was a problem hiding this comment.
Reaping still races with a same-hash commit.
reap is decided from the frozen live_hashes snapshot and then deleted later in separate write transactions. If a concurrent commit reintroduces the same content-addressed node/value hash after Lines 410-415 but before Line 434 runs, this cycle still deletes it and the newly committed root now points at a missing node/value. The one-cycle tombstone deferral only protects hashes that become live before the next prune starts; it does not protect resurrection during the reaping cycle itself. Please re-check liveness against a writer-stable view immediately before delete, or keep prior-cycle tombstones deferred until reap is coupled to writer synchronization. As per coding guidelines, crates/sentrix-trie/**: CONSENSUS-CRITICAL — suggestions only, no destructive rewrites.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/sentrix-trie/src/storage.rs` around lines 404 - 436, The current reap
loop (using tombstoned/live_hashes -> reap -> later deletes via self.mdbx.delete
on tables::TABLE_TRIE_TOMBSTONES and data_table) can delete hashes that were
reintroduced by a concurrent writer; fix by rechecking liveness under a
writer-stable view immediately before performing deletes: just before iterating
clear/reap to call self.mdbx.delete, obtain a writer-locked/stable snapshot (or
re-run the equivalent live_hashes check under transactional/lock semantics
provided by mdbx) and filter the reap list against that fresh snapshot (i.e.,
recompute whether each h in reap is still absent from live_hashes), only then
perform the deletes on TABLE_TRIE_TOMBSTONES and data_table; alternatively,
defer deletes of prior-cycle tombstones until they can be executed inside the
same writer-synchronized transaction that confirms absence.
Source: Coding guidelines
| let tv = v.try_into().map(u64::from_be_bytes).unwrap_or(0); | ||
| if tv < version { | ||
| reap.push(h); |
There was a problem hiding this comment.
Fail closed on malformed tombstone payloads.
Line 413 turns any non-8-byte tombstone value into tv = 0, which makes that entry immediately reapable on the next prune. In this path, corrupt tombstone state should stop GC, not silently authorize deletion of trie data. Please surface a storage-corruption error instead of defaulting. As per coding guidelines, crates/sentrix-trie/**: CONSENSUS-CRITICAL — suggestions only, no destructive rewrites.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/sentrix-trie/src/storage.rs` around lines 413 - 415, The current logic
maps any non-8-byte tombstone payload to tv = 0 via
v.try_into().map(u64::from_be_bytes).unwrap_or(0), which silently makes
malformed tombstones immediately reapable; instead, detect malformed tombstone
payloads and surface a storage-corruption error to stop GC. Replace the
unwrap_or(0) path with error handling that returns or propagates a
StorageCorruption (or appropriate Result::Err) from the enclosing function (the
prune/scan routine that owns variables tv, v, version, reap, and h), include
contextual info (e.g., key/header h and payload length) in the error, and only
push h into reap when the tombstone value successfully parses and tv < version.
Ensure callers handle the Result so GC aborts on corrupt tombstone payloads.
Source: Coding guidelines
…residual (#799) Follow-up to #798 addressing two CodeRabbit findings on the generational GC: - Major: a non-8-byte tombstone payload mapped to tv=0, making a corrupt entry immediately reapable (could delete live trie data). Now fail closed — skip reaping malformed tombstones (keep the entry) and warn, instead of defaulting to 0. - Critical (documentation): the reap still has a narrow content-addressed resurrection window inside its own scan+delete. Documented the caller contract (tree.rs re-augments `live` to the latest committed root immediately before each gc pass, collapsing the window from the old multi-minute walk to this method's ms-scale scan+delete) and that the complete elimination needs writer-coupling (walk+delete in one RW txn) — the tracked fix this PR-series deliberately defers to avoid a chain-blocking write lock. This series strictly narrows the race; verify_integrity remains the backstop. Regression test: a malformed tombstone must not authorise deletion (data survives). sentrix-trie: 83 passed.
…800) * fix(trie): race-free offline prune; background prune off by default The durable fix for the recurring trie "missing node" stalls. Root cause (confirmed): the BACKGROUND prune runs on a thread cloning the trie at a frozen version while block apply keeps committing; a node committed or content-addressed-resurfaced during the multi-minute live-set walk is absent from the frozen snapshot and gets deleted as an orphan. Five partial fixes (#711 reload-before-gc, #714 collect_reachable depth, #791 split passes, #798 generational defer) each narrowed but never closed the window — #798 deleted live nodes again in production (val3, h=6300000 prune → "missing node" → propose stall). A truly race-free background prune needs walk+delete in one MDBX RW txn (a chain-blocking write lock for the 10-20 min walk) or refcounting — both deferred. This takes the mechanism-agnostic safe route: eliminate the concurrency. - `SentrixTrie::prune_offline(keep)` — same walk + keep-window as `prune`, minus the racy augment and the generational deferral, using the combined immediate `gc_orphaned_nodes`. Correct only with no concurrent commits, which is guaranteed by running it on a STOPPED node. - `sentrix chain prune [--keep N]` — operator runs it during a maintenance halt (same model as `chain reset-trie` / `verify-deep`). Safe on a single peer: deleting unreachable nodes does not change the state_root (which only commits reachable nodes), so no fork risk — unlike reset-trie. - Background prune is now OFF by default. It only runs with SENTRIX_ENABLE_BACKGROUND_TRIE_PRUNE=1 (and the legacy SENTRIX_DISABLE_TRIE_PRUNE=1 still force-disables). maybe_prune_trie early-returns by default, so the apply path no longer even spawns the racy thread. Trade-off: storage grows between maintenance prunes. Acceptable; correctness over convenience after five automatic-prune failures. Tests: prune_offline reclaims orphans AND the current value survives the prune (the regression guard #798 lacked — a live-node deletion would make the post-prune `get` fail with "missing node"); background_prune_enabled gate is off by default + opt-in semantics. cargo test -p sentrix-trie: 84 passed; -p sentrix-core: 263 passed; cargo check --workspace -D warnings clean. * fix(trie): clippy nonminimal-bool + chain prune fail-closed guards (CodeRabbit) - Clippy: background_prune_enabled uses is_none_or instead of !..is_some_and (nonminimal_bool, -D warnings in CI; cargo check didn't catch it). - chain prune Major guards (CodeRabbit on #800): - Refuse if no persisted trie root at the current height — never let the maintenance prune trigger an init_trie backfill rebuild (different node shape = reset-trie fork class). New Storage::has_persisted_trie_root. - Enforce the offline precondition instead of only printing it: detect a running node via height-stability (sample across >5s poll-persist interval) and fail closed if the chain advanced; override with SENTRIX_ALLOW_ONLINE_PRUNE=1 for rare recovery.
…802) test_c03_pass2_failure_rolls_back_state credits v1 to u64::MAX - reward and relies on the coinbase credit overflowing in Pass 2. It read get_block_reward() (which depends on the reward-fork env vars: VOYAGER_REWARD_V2_HEIGHT / TOKENOMICS_V2_HEIGHT / halving) WITHOUT holding env_test_lock, while sibling tests set those vars under that lock. Under the parallel `cargo test` run (e.g. the report-only coverage job) a concurrent mutation changed `reward` mid-test, the overflow didn't trigger, add_block returned Ok, and unwrap_err panicked — the recurring CI flake that needed re-runs on #795/#798/#800/#801. Acquire env_test_lock at the top of the test so it serializes against the env-mutating tests. Test-only change. sentrix-core suite + clippy --all-targets -D warnings clean.
Problem
The periodic trie prune (
maybe_prune_trie, every 5000 blocks) runs on a background thread whose live-set is a snapshot frozen at spawn time. Blocks committed during the multi-minute GC walk write new nodes/values the snapshot never saw → the oldgc_tabledeleted them as "orphans". #791 split the node/value passes + re-augmented the live-set to narrow the window, but (per its own comment) the complete fix is "one RW txn around walk+delete".The window persisted: testnet val3 hit
trie: missing nodeincreate_blockpre-apply, so every time it was BFT proposer the round ate a 20s propose timeout → chain crawled to 0.13 blk/s until a restart reloaded the trie from MDBX. Recurring across the fleet.Fix — generational (deferred) GC
Instead of deleting an orphan immediately, the prune tombstones it (
TABLE_TRIE_TOMBSTONES, keyeddisc‖hash→ version) and only deletes it on a LATER prune if it's still orphan then.A hash committed during this prune is orphan vs the frozen snapshot → tombstoned this cycle — but next cycle it's a recent live node → its tombstone is dropped instead of deleting it. Race closed. Worst-case failure mode is benign: under-deletion (storage grows one extra cycle), never deletion of a live entry. Deletion lags one prune interval (~5000 blocks).
Safety
state_rootidentical; only dead-node deletion timing moves. No fork gate needed.ALL_TABLES) → existing chain.dbs need no migration.Tests
test_generational_gc_defers_then_reaps— orphan survives cycle 1, reaped cycle 2.test_generational_gc_spares_node_committed_during_prune— orphan-this-cycle / live-next-cycle is never deleted (the race repro; fails against the oldgc_table).treeprune tests to the deferred contract (tombstone → advance version → reap).cargo test -p sentrix-trie: 82 passed.cargo check --workspace -D warningsclean.Summary by CodeRabbit
Chores
Bug Fixes