trie/archiver: streaming archival + pathdb journal consistency by tellabg · Pull Request #567 · gballet/go-ethereum

tellabg · 2026-02-22T13:22:40Z

Summary

Two fixes for archive generate:

1. Streaming subtree archival (fix OOM)

The previous recursive approach loaded the entire trie into memory before archiving any subtree. For large storage tries (contracts with millions of slots), this caused OOM kills.

Fix: NodeIterator-based streaming approach:

probeHeight(): bounded raw DB reads, discards decoded nodes
collectSubtree(): materializes only the bounded height-3 subtree
Memory: O(iterator_stack + current_subtree) vs O(entire_trie)

Result: RSS stable at 2 MB (vs OOM at 22 GB). ~3h on Hoodi.

2. Flush diff layers + re-journal (fix block import)

After archiving, geth import failed with:

unknown ancestor: journal deleted, chain head lost
out-order append: state history freezer gaps
triedb layer missing: re-journal used stale root

Fix:

Flush diff layers via Commit() before archiving
Re-open fresh triedb after archiving for clean journal
No journal deletion, no DisableStateHistory()

Result: 1020 blocks imported successfully after archiving (~265 mgasps).

Test results (Hoodi, 22 GB RAM)

Generate: 105,292 subtrees, 221,455 leaves, 15 MB archive, 3h16m, RSS 2 MB
Import: 1020 blocks in 2m04s, expired nodes resolved from archive

Replace the recursive approach that loaded the entire trie into memory with a streaming NodeIterator-based approach: - processTrie now uses NodeIterator to walk the trie node-by-node - probeHeight reads nodes from raw DB, computes height bounded at 3, and discards decoded nodes immediately (no in-memory trie buildup) - collectSubtree only materializes the bounded height-3 subtree being archived (at most ~4096 nodes) - Memory usage: O(iterator_stack) + O(current_subtree) instead of O(entire_trie) This fixes OOM kills on large storage tries (e.g. contracts with millions of storage slots) where the previous approach would load all nodes and subtreeInfo into memory before archiving any of them.

Instead of deleting the pathdb journal after archive generation (which breaks the chain head and prevents block imports), properly integrate with pathdb: 1. Open triedb with pathdb support (not just raw KV) 2. Disable state history freezer (avoid append gaps) 3. Flush all diff layers to disk via Commit() before archiving 4. After archiving, re-journal the pathdb state (disk layer only) This ensures geth can restart cleanly after archiving and continue importing blocks without 'unknown ancestor' errors.

…ution

gballet · 2026-03-11T20:13:03Z

+		// height == 3: collect and archive this subtree immediately.
+		info := a.collectSubtree(owner, path, hash)
+		if info == nil {
+			continue


kinda weird that info is nil and we just silently skip it

tellabg added 2 commits February 21, 2026 06:13

tellabg requested a review from gballet as a code owner February 22, 2026 13:22

tellabg force-pushed the archival-streaming-fix branch from 2e4cbf9 to 2dce6ab Compare February 26, 2026 20:29

trie: add resurrection timing and depth metrics to expired node resol…

8447505

…ution

tellabg force-pushed the archival-streaming-fix branch from 2dce6ab to 8447505 Compare February 26, 2026 20:34

gballet reviewed Mar 11, 2026

View reviewed changes

Update trie/archiver.go

3158ea5

gballet merged commit a37814d into gballet:archival-command Mar 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trie/archiver: streaming archival + pathdb journal consistency#567

trie/archiver: streaming archival + pathdb journal consistency#567
gballet merged 4 commits intogballet:archival-commandfrom
tellabg:archival-streaming-fix

tellabg commented Feb 22, 2026

Uh oh!

Uh oh!

gballet Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tellabg commented Feb 22, 2026

Summary

1. Streaming subtree archival (fix OOM)

2. Flush diff layers + re-journal (fix block import)

Test results (Hoodi, 22 GB RAM)

Uh oh!

Uh oh!

gballet Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants