Skip to content

FlatDb computes a wrong state root and rejects a canonical block (consensus divergence) — chiado, intermittent #11993

Description

@stdevMac

Summary

With FlatDb.Enabled=true on master, Nethermind intermittently computes a wrong state root and rejects a canonical block with InvalidStateRoot. A non-flat (patricia/HalfPath) control node accepts the same block. This is a consensus divergence specific to the flat-state backend.

Evidence

  • Smoke run 27314819957 (chiado, FlatDb.Enabled=true) rejected canonical block 21574961 (0x0ba5905c2295b4b12ed035f8eace55f4ae9c288f55afcdcc7eaf7d6c6b9b9394, 0 txs, 8 non-zero withdrawals): computed root 0x78374a9a… vs canonical 0x12173c61….
  • A patricia control node passed the same block.
  • Intermittent. An offline full FlatTrieVerifier cross-check on a synced chiado DB (head 21576781, which had processed past the bad block cleanly on a different run) came back clean: 539,247 accounts / 35,390,325 slots, 0 mismatched, 0 missing. A 24h forward-processing soak on two fresh chiado nodes did not re-trigger it. So a synced DB can be divergence-free; the fault must be caught on a fresh sync that hits the right window.

Where it likely is

Flat-backend reads are not hash-verified the way trie-node reads are, so a wrong flat value surfaces only as a silent wrong state root. Static analysis points at the SnapshotCompactor self-destruct / storage-clear merge when folding per-block snapshots into a compacted snapshot:

  • the storage-slot clear keys on Address, while the storage-trie-node clear keys on the account-path hash and is gated on !isNewAccount, with asymmetric self-destruct-marker semantics (TryAdd(addr, true) for new-account vs [addr] = false for non-new).
  • the under-tested case is self-destruct-then-recreate within a single compaction window (e.g. a CREATE2 redeploy), for which there is no regression test — SnapshotCompactorTests only covers destroy-with-storage and destroy-new-account.

The per-instance persisted compaction offset (#11756) is likely a trigger, not the cause: it makes compaction-window boundaries node-specific, so the buggy merge path is exercised at block alignments that a deterministic schedule / the patricia control never hit — consistent with "one flat node diverged, the control didn't."

How to reproduce / confirm

  • chiado forward sync with --FlatDb.Enabled true --FlatDb.VerifyWithTrie true to convert the silent InvalidStateRoot into a precise per-account/per-slot TrieException naming the diverging address/slot — look for FlatStorageTree "Get slot got wrong value … Self destruct it {idx}" and FlatWorldStateScope "Incorrect account …".
  • Pin the schedule with --FlatDb.CompactionOffset 0 and run --FlatDb.InlineCompaction true to remove compactor/persist timing as a variable.
  • Signature: a slot-level throw naming a recently self-destructed/recreated contract near the diverging block.

Note: --FlatDb.VerifyWithTrie true cannot be used for a post-snap live soak — the live trie-comparison store can't reload the snap-sync state root once evicted and the node wedges on a TrieNodeException (unrelated to this bug). Use it on a forward sync from genesis/early, or rely on the natural InvalidStateRoot on a snap+forward soak.

Related (NOT duplicates)

Do not allowlist

Real consensus divergence on the flat backend; must be fixed before FlatDb ships. The smoke SyncCNWSF / Sync*Flat tests correctly catch it.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions