Skip to content

fix(trie): refresh prune live-set between node and value GC passes#791

Merged
github-actions[bot] merged 1 commit into
mainfrom
fix/trie-prune-gc-value-race
Jun 4, 2026
Merged

fix(trie): refresh prune live-set between node and value GC passes#791
github-actions[bot] merged 1 commit into
mainfrom
fix/trie-prune-gc-value-race

Conversation

@satyakwok
Copy link
Copy Markdown
Member

@satyakwok satyakwok commented Jun 4, 2026

Symptom

Testnet halted at h=6,239,526 — every node failed to boot with trie integrity: orphan value reference 48aa4f6c… from leaf at depth 6 — value missing from trie_values.

Root cause (traced, not guessed)

The orphan value belongs to a regular EOA account leaf (0xd98d63ea…, balance 0.0994 SRX, nonce 54). Recomputing hash_leaf(address_to_key(addr), account_value_bytes(99460000, 54)) from AccountDB reproduces 48aa4f6c exactly — so AccountDB was correct and the trie root (69a51e68) canonical and identical across val1/3/4. The corruption was a lost value blob, not a state divergence.

How it got lost: maybe_prune_trie spawns the prune on a background-thread clone whose version is frozen. gc_orphaned_nodes GC'd nodes then values off one live snapshot, but the nodes pass alone runs 10–20 min on a big chain.db. Blocks committed during that pass write new leaf values the snapshot never saw; the values pass then deletes those freshly-committed, still-live values. 0xd98d is a hot account (nonce 54) so it was updated during a prune window. The 2026-05-24 single-augment closer didn't cover commits that land during the long nodes pass.

Fix

  • Split gc_orphaned_nodes into gc_nodes + gc_values (combined method kept for non-racy callers/tests).
  • In prune(), re-augment live with roots committed during the nodes pass before the values pass (new augment_live_to_latest helper).

Shrinks the window from the entire nodes-pass duration to the sub-ms gap before the values cursor opens its txn. The complete race-free fix (walk+delete in one RW txn) is a larger TrieStorage refactor tracked separately; boot-time verify_integrity remains the backstop.

Test

test_values_gc_keeps_leaf_committed_during_nodes_pass — asserts both directions: the stale live set omits a value committed during the nodes pass (the bug), and the re-augmented live set keeps it through the values GC. Full sentrix-trie suite: 80 passed. Built -D warnings clean across sentrix-trie + sentrix-core.

Ops

Testnet dbs already recovered out-of-band (surgical value-restore on val3/val4, cp to val1/val2/fullnode); all 5 boot clean at 69a51e68. This binary (2.2.30) is what should be deployed before the fleet is restarted, so the race can't recur.

Summary by CodeRabbit

  • Chores

    • Version bumped to 2.2.30
  • Refactor

    • Improved garbage collection with split-phase processing for trie node and value cleanup
    • Enhanced trie pruning logic for better operational reliability

The periodic prune runs on a background-thread clone whose version is
frozen. gc_orphaned_nodes GC'd nodes then values off a single live
snapshot, but the nodes pass alone runs 10-20 min on a big chain.db.
Blocks committed during that pass write new leaf values the snapshot
never saw, so the values pass deleted them — testnet halt at h=6,239,526
(orphan value for a hot account leaf, account 0xd98d..., nonce 54).

Split GC into gc_nodes + gc_values and re-augment the live set with
roots committed during the nodes pass before running the values pass.
Shrinks the race window from the full nodes-pass duration to the gap
before the values cursor opens its txn. Full atomicity (walk+delete in
one RW txn) is the eventual complete fix; boot-time verify_integrity is
the backstop. Bumps workspace to 2.2.30.
@github-actions github-actions Bot enabled auto-merge (squash) June 4, 2026 17:17
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: a0568a11-3e38-454e-9b83-735cc1e56bce

📥 Commits

Reviewing files that changed from the base of the PR and between 6416766 and d1137ab.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock, !**/*.lock
📒 Files selected for processing (3)
  • Cargo.toml
  • crates/sentrix-trie/src/storage.rs
  • crates/sentrix-trie/src/tree.rs

📝 Walkthrough

Walkthrough

This PR splits trie garbage collection into separate node-only and value-only phases to fix a race condition in SentrixTrie::prune. Previously, when the trie was cloned at a frozen version while the main loop continued committing newer roots, the live-hash set could become stale, causing values committed during GC to be incorrectly deleted. The fix introduces TrieStorage::gc_nodes and TrieStorage::gc_values public methods, refactors gc_orphaned_nodes to call them in sequence, and modifies SentrixTrie::prune to re-augment the live-hash set between the two phases. A regression test validates that newly-committed leaf values survive the values-GC pass. The workspace version is also bumped to 2.2.30.

Sequence Diagram

sequenceDiagram
  participant Prune as SentrixTrie::prune
  participant Augment as augment_live_to_latest
  participant GCNodes as TrieStorage::gc_nodes
  participant GCValues as TrieStorage::gc_values
  Prune->>Augment: augment live to latest (initial)
  Augment-->>Prune: updated live_hashes
  Prune->>GCNodes: gc_nodes(live_hashes)
  Note over GCNodes: concurrent roots may commit here
  GCNodes-->>Prune: nodes_count
  Prune->>Augment: augment live to latest (re-sync)
  Augment-->>Prune: live_hashes with newly committed roots
  Prune->>GCValues: gc_values(live_hashes)
  GCValues-->>Prune: values_count
  Prune-->>Prune: return (roots_pruned, nodes_count + values_count)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • sentrix-labs/sentrix#714: Modifies prune-time reachability logic in collect_reachable to handle depth-aware empty-subtree pruning, affecting the same live-set input to GC as this PR's augmentation logic.
  • sentrix-labs/sentrix#711: Also mitigates the prune race by augmenting the live-hash set before GC, though this PR further splits GC into node-then-value phases with re-augmentation between passes.
  • sentrix-labs/sentrix#584: Runs SentrixTrie::prune concurrently in a background thread, directly exercising the split-phase GC race-handling logic introduced in this PR.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is comprehensive and covers symptom, root cause, fix, tests, and ops; however, the description template requires specific checklist sections (Scope, Checks, Deploy impact) which are not addressed. Add the required template sections (Scope, Checks, Deploy impact) with appropriate checkboxes marked to indicate test-only vs contract changes, build/test status, and deployment requirements.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately summarizes the main fix: refreshing the live-set between GC passes to prevent the race condition that lost value blobs.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/trie-prune-gc-value-race

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot merged commit a132baf into main Jun 4, 2026
18 checks passed
github-actions Bot pushed a commit that referenced this pull request Jun 6, 2026
…798)

The periodic trie prune runs on a background thread whose live-set is a
snapshot frozen at spawn time. Blocks committed during the multi-minute GC
walk write new nodes/values the snapshot never saw; the old gc deleted them
as "orphans". #791 split the node/value passes and re-augmented the live-set
to narrow the window, but its own comment notes the complete fix is still
"one RW txn around walk+delete". The window persisted: testnet val3 hit a
"missing node" mid-traversal in create_block, so every time it was proposer
the round ate a 20s timeout and the chain crawled to 0.13 blk/s until a
restart reloaded the trie.

Generational GC: instead of deleting an orphan immediately, the prune
TOMBSTONES it (TABLE_TRIE_TOMBSTONES, keyed `disc||hash` -> version) and only
deletes it on a LATER prune if it is STILL orphan then. A hash committed
during this prune is orphan vs the snapshot, so it's tombstoned this cycle —
but next cycle it's a recent live node, so its tombstone is dropped instead
of deleting it. The race is closed; the worst-case failure mode is benign
(under-deletion = storage grows one extra cycle, never deletion of a live
entry). Deletion lags one prune interval (~5000 blocks).

The new tombstone table auto-creates on env open (ALL_TABLES), so existing
chain.dbs need no migration. No consensus/state change: trie content and
state_root are identical; only dead-node deletion TIMING moves. Not the
complete single-RW-txn fix, but race-free without holding a 10-20 min write
lock (which would stall the chain).

Tests: gc_table_generational defers-then-reaps; a node orphan-this-cycle /
live-next-cycle is never deleted (the race repro, fails against the old
gc_table). Updated three tree prune tests to the deferred contract
(tombstone, advance version, reap).
github-actions Bot pushed a commit that referenced this pull request Jun 6, 2026
…800)

* fix(trie): race-free offline prune; background prune off by default

The durable fix for the recurring trie "missing node" stalls. Root cause
(confirmed): the BACKGROUND prune runs on a thread cloning the trie at a
frozen version while block apply keeps committing; a node committed or
content-addressed-resurfaced during the multi-minute live-set walk is absent
from the frozen snapshot and gets deleted as an orphan. Five partial fixes
(#711 reload-before-gc, #714 collect_reachable depth, #791 split passes,
#798 generational defer) each narrowed but never closed the window — #798
deleted live nodes again in production (val3, h=6300000 prune → "missing
node" → propose stall). A truly race-free background prune needs walk+delete
in one MDBX RW txn (a chain-blocking write lock for the 10-20 min walk) or
refcounting — both deferred.

This takes the mechanism-agnostic safe route: eliminate the concurrency.

- `SentrixTrie::prune_offline(keep)` — same walk + keep-window as `prune`,
  minus the racy augment and the generational deferral, using the combined
  immediate `gc_orphaned_nodes`. Correct only with no concurrent commits,
  which is guaranteed by running it on a STOPPED node.
- `sentrix chain prune [--keep N]` — operator runs it during a maintenance
  halt (same model as `chain reset-trie` / `verify-deep`). Safe on a single
  peer: deleting unreachable nodes does not change the state_root (which only
  commits reachable nodes), so no fork risk — unlike reset-trie.
- Background prune is now OFF by default. It only runs with
  SENTRIX_ENABLE_BACKGROUND_TRIE_PRUNE=1 (and the legacy
  SENTRIX_DISABLE_TRIE_PRUNE=1 still force-disables). maybe_prune_trie
  early-returns by default, so the apply path no longer even spawns the
  racy thread.

Trade-off: storage grows between maintenance prunes. Acceptable; correctness
over convenience after five automatic-prune failures.

Tests: prune_offline reclaims orphans AND the current value survives the
prune (the regression guard #798 lacked — a live-node deletion would make
the post-prune `get` fail with "missing node"); background_prune_enabled
gate is off by default + opt-in semantics. cargo test -p sentrix-trie: 84
passed; -p sentrix-core: 263 passed; cargo check --workspace -D warnings clean.

* fix(trie): clippy nonminimal-bool + chain prune fail-closed guards (CodeRabbit)

- Clippy: background_prune_enabled uses is_none_or instead of !..is_some_and
  (nonminimal_bool, -D warnings in CI; cargo check didn't catch it).
- chain prune Major guards (CodeRabbit on #800):
  - Refuse if no persisted trie root at the current height — never let the
    maintenance prune trigger an init_trie backfill rebuild (different node
    shape = reset-trie fork class). New Storage::has_persisted_trie_root.
  - Enforce the offline precondition instead of only printing it: detect a
    running node via height-stability (sample across >5s poll-persist
    interval) and fail closed if the chain advanced; override with
    SENTRIX_ALLOW_ONLINE_PRUNE=1 for rare recovery.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant