fix(trie): refresh prune live-set between node and value GC passes#791
Conversation
The periodic prune runs on a background-thread clone whose version is frozen. gc_orphaned_nodes GC'd nodes then values off a single live snapshot, but the nodes pass alone runs 10-20 min on a big chain.db. Blocks committed during that pass write new leaf values the snapshot never saw, so the values pass deleted them — testnet halt at h=6,239,526 (orphan value for a hot account leaf, account 0xd98d..., nonce 54). Split GC into gc_nodes + gc_values and re-augment the live set with roots committed during the nodes pass before running the values pass. Shrinks the race window from the full nodes-pass duration to the gap before the values cursor opens its txn. Full atomicity (walk+delete in one RW txn) is the eventual complete fix; boot-time verify_integrity is the backstop. Bumps workspace to 2.2.30.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Plus Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR splits trie garbage collection into separate node-only and value-only phases to fix a race condition in Sequence DiagramsequenceDiagram
participant Prune as SentrixTrie::prune
participant Augment as augment_live_to_latest
participant GCNodes as TrieStorage::gc_nodes
participant GCValues as TrieStorage::gc_values
Prune->>Augment: augment live to latest (initial)
Augment-->>Prune: updated live_hashes
Prune->>GCNodes: gc_nodes(live_hashes)
Note over GCNodes: concurrent roots may commit here
GCNodes-->>Prune: nodes_count
Prune->>Augment: augment live to latest (re-sync)
Augment-->>Prune: live_hashes with newly committed roots
Prune->>GCValues: gc_values(live_hashes)
GCValues-->>Prune: values_count
Prune-->>Prune: return (roots_pruned, nodes_count + values_count)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…798) The periodic trie prune runs on a background thread whose live-set is a snapshot frozen at spawn time. Blocks committed during the multi-minute GC walk write new nodes/values the snapshot never saw; the old gc deleted them as "orphans". #791 split the node/value passes and re-augmented the live-set to narrow the window, but its own comment notes the complete fix is still "one RW txn around walk+delete". The window persisted: testnet val3 hit a "missing node" mid-traversal in create_block, so every time it was proposer the round ate a 20s timeout and the chain crawled to 0.13 blk/s until a restart reloaded the trie. Generational GC: instead of deleting an orphan immediately, the prune TOMBSTONES it (TABLE_TRIE_TOMBSTONES, keyed `disc||hash` -> version) and only deletes it on a LATER prune if it is STILL orphan then. A hash committed during this prune is orphan vs the snapshot, so it's tombstoned this cycle — but next cycle it's a recent live node, so its tombstone is dropped instead of deleting it. The race is closed; the worst-case failure mode is benign (under-deletion = storage grows one extra cycle, never deletion of a live entry). Deletion lags one prune interval (~5000 blocks). The new tombstone table auto-creates on env open (ALL_TABLES), so existing chain.dbs need no migration. No consensus/state change: trie content and state_root are identical; only dead-node deletion TIMING moves. Not the complete single-RW-txn fix, but race-free without holding a 10-20 min write lock (which would stall the chain). Tests: gc_table_generational defers-then-reaps; a node orphan-this-cycle / live-next-cycle is never deleted (the race repro, fails against the old gc_table). Updated three tree prune tests to the deferred contract (tombstone, advance version, reap).
…800) * fix(trie): race-free offline prune; background prune off by default The durable fix for the recurring trie "missing node" stalls. Root cause (confirmed): the BACKGROUND prune runs on a thread cloning the trie at a frozen version while block apply keeps committing; a node committed or content-addressed-resurfaced during the multi-minute live-set walk is absent from the frozen snapshot and gets deleted as an orphan. Five partial fixes (#711 reload-before-gc, #714 collect_reachable depth, #791 split passes, #798 generational defer) each narrowed but never closed the window — #798 deleted live nodes again in production (val3, h=6300000 prune → "missing node" → propose stall). A truly race-free background prune needs walk+delete in one MDBX RW txn (a chain-blocking write lock for the 10-20 min walk) or refcounting — both deferred. This takes the mechanism-agnostic safe route: eliminate the concurrency. - `SentrixTrie::prune_offline(keep)` — same walk + keep-window as `prune`, minus the racy augment and the generational deferral, using the combined immediate `gc_orphaned_nodes`. Correct only with no concurrent commits, which is guaranteed by running it on a STOPPED node. - `sentrix chain prune [--keep N]` — operator runs it during a maintenance halt (same model as `chain reset-trie` / `verify-deep`). Safe on a single peer: deleting unreachable nodes does not change the state_root (which only commits reachable nodes), so no fork risk — unlike reset-trie. - Background prune is now OFF by default. It only runs with SENTRIX_ENABLE_BACKGROUND_TRIE_PRUNE=1 (and the legacy SENTRIX_DISABLE_TRIE_PRUNE=1 still force-disables). maybe_prune_trie early-returns by default, so the apply path no longer even spawns the racy thread. Trade-off: storage grows between maintenance prunes. Acceptable; correctness over convenience after five automatic-prune failures. Tests: prune_offline reclaims orphans AND the current value survives the prune (the regression guard #798 lacked — a live-node deletion would make the post-prune `get` fail with "missing node"); background_prune_enabled gate is off by default + opt-in semantics. cargo test -p sentrix-trie: 84 passed; -p sentrix-core: 263 passed; cargo check --workspace -D warnings clean. * fix(trie): clippy nonminimal-bool + chain prune fail-closed guards (CodeRabbit) - Clippy: background_prune_enabled uses is_none_or instead of !..is_some_and (nonminimal_bool, -D warnings in CI; cargo check didn't catch it). - chain prune Major guards (CodeRabbit on #800): - Refuse if no persisted trie root at the current height — never let the maintenance prune trigger an init_trie backfill rebuild (different node shape = reset-trie fork class). New Storage::has_persisted_trie_root. - Enforce the offline precondition instead of only printing it: detect a running node via height-stability (sample across >5s poll-persist interval) and fail closed if the chain advanced; override with SENTRIX_ALLOW_ONLINE_PRUNE=1 for rare recovery.
Symptom
Testnet halted at h=6,239,526 — every node failed to boot with
trie integrity: orphan value reference 48aa4f6c… from leaf at depth 6 — value missing from trie_values.Root cause (traced, not guessed)
The orphan value belongs to a regular EOA account leaf (
0xd98d63ea…, balance 0.0994 SRX, nonce 54). Recomputinghash_leaf(address_to_key(addr), account_value_bytes(99460000, 54))from AccountDB reproduces48aa4f6cexactly — so AccountDB was correct and the trie root (69a51e68) canonical and identical across val1/3/4. The corruption was a lost value blob, not a state divergence.How it got lost:
maybe_prune_triespawns the prune on a background-thread clone whoseversionis frozen.gc_orphaned_nodesGC'd nodes then values off onelivesnapshot, but the nodes pass alone runs 10–20 min on a big chain.db. Blocks committed during that pass write new leaf values the snapshot never saw; the values pass then deletes those freshly-committed, still-live values.0xd98dis a hot account (nonce 54) so it was updated during a prune window. The 2026-05-24 single-augment closer didn't cover commits that land during the long nodes pass.Fix
gc_orphaned_nodesintogc_nodes+gc_values(combined method kept for non-racy callers/tests).prune(), re-augmentlivewith roots committed during the nodes pass before the values pass (newaugment_live_to_latesthelper).Shrinks the window from the entire nodes-pass duration to the sub-ms gap before the values cursor opens its txn. The complete race-free fix (walk+delete in one RW txn) is a larger TrieStorage refactor tracked separately; boot-time
verify_integrityremains the backstop.Test
test_values_gc_keeps_leaf_committed_during_nodes_pass— asserts both directions: the stale live set omits a value committed during the nodes pass (the bug), and the re-augmented live set keeps it through the values GC. Fullsentrix-triesuite: 80 passed. Built-D warningsclean across sentrix-trie + sentrix-core.Ops
Testnet dbs already recovered out-of-band (surgical value-restore on val3/val4, cp to val1/val2/fullnode); all 5 boot clean at
69a51e68. This binary (2.2.30) is what should be deployed before the fleet is restarted, so the race can't recur.Summary by CodeRabbit
Chores
Refactor