perf(l1): move SST files into temp DB during snap sync ingest by iovoid · Pull Request #6532 · lambdaclass/ethrex

iovoid · 2026-04-27T12:28:10Z

Motivation

During storage insertion in snapsync, we use substantially more disk.

Description

Uses RocksDB move-based ingestion to reduce peak usage.

github-actions · 2026-04-27T12:29:06Z

🤖 Kimi Code Review

The changes look correct and represent a sensible performance optimization for snap sync operations.

Summary

File: crates/networking/p2p/sync/snap_sync.rs
Change: Replacing SST file copying with atomic moves during account and storage ingestion

Review Points

Performance: Using set_move_files(true) eliminates redundant disk I/O by avoiding copies of large SST files during ingestion. This is significant for snap sync where datasets can be hundreds of GB.
Safety: The approach is safe because:
- The comment correctly documents the prerequisite (same datadir/filesystem) for atomic rename to work
- Both insert_accounts (line 1199) and insert_storages (line 1309) operate on temporary ingestion databases where the SST files are ephemeral
- If ingestion fails, RocksDB maintains consistency and the files remain in their original location
Error Handling: Properly preserved - the map_err conversion to SyncError::RocksDBError remains intact in both locations.
Code Quality: The comments explaining why this is safe (same datadir assumption) are valuable for future maintainers.

Minor Suggestion
Consider adding a defensive debug assertion or log warning if the ingest operation fails with a cross-device error, which would indicate the same-datadir assumption was violated:

db.ingest_external_file_opts(&ingest_opts, file_paths)
    .map_err(|err| {
        // Log filesystem boundary violations for debugging
        if err.to_string().contains("cross-device") {
            warn!("SST ingest failed: snapshot and temp DB not on same filesystem");
        }
        SyncError::RocksDBError(err.into_string())
    })?;

Otherwise, LGTM - this is a straightforward disk space and I/O optimization.

Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

github-actions · 2026-04-27T12:29:44Z

🤖 Claude Code Review

Here is my review of PR #6532:

Review: `perf(l1): move SST files into temp DB during snap sync ingest`

Summary

A minimal, focused change: swaps ingest_external_file for ingest_external_file_opts with set_move_files(true) in both insert_accounts and insert_storages. The intent is to use a rename(2) syscall instead of a full file copy, halving peak disk usage during snap sync ingestion.

Correctness & Safety Concerns

Filesystem assumption is load-bearing but unenforced

The comment states: "the snapshot dir and the temp DB live under the same datadir, so rename succeeds". This is a runtime assumption with no enforcement. If a user configures the snapshot directory on a separate mount point or filesystem from the data directory, rename(2) will fail with EXDEV. RocksDB's behavior on a cross-device rename depends on the version — some fall back to copy+delete, others surface the error.

This should either be validated at startup (assert same device via stat(2) / std::fs::metadata(...).st_dev()) or at minimum the assumption should be guarded somewhere visible, not just in a comment.

Idempotency / retry safety regression

With the original copy semantics, a partial failure (e.g., after some files are ingested but the process crashes) leaves all source SST files intact — the sync can be retried cleanly.

With move semantics, once RocksDB renames a file into the temp DB, it no longer exists at the source path. If ingestion then fails (RocksDB error mid-batch, disk full, etc.) and the temp DB is discarded during cleanup, those SST files are gone. The sync would need to re-request that data from peers from scratch. This is likely acceptable (snap sync already handles full restarts), but the error path should be confirmed to handle this gracefully — particularly that partial temp-DB state is cleaned up and the sync restarts from a consistent point.

let mut is correct — set_move_files takes &mut self, so the binding needs to be mutable.

Minor Points

Duplicated option construction — Both insert_accounts and insert_storages contain identical boilerplate for constructing IngestExternalFileOptions. A one-liner helper (fn move_ingest_opts() -> rocksdb::IngestExternalFileOptions) would avoid drift if options ever need to change again. This is a minor style note, not a blocker.

Error message quality — SyncError::RocksDBError(err.into_string()) loses the call-site context. Already the case before this PR, but worth noting for future observability work — if the move fails for the EXDEV reason above, the error message will be opaque.

Verdict

The optimization is sound under the stated assumption, and the performance benefit (no double-copy of potentially large leaf data) is real. The main concern is the unvalidated same-filesystem assumption — if that can break in practice, it will produce a hard-to-diagnose sync failure. Before merging, confirm:

That the snapshot path and DB path are guaranteed to resolve to the same device in all supported configurations.
That the error/cleanup path after a failed ingest properly discards any partial temp DB state so sync can restart cleanly.

If both are confirmed, the change is good to merge.

Automated review by Claude (Anthropic) · sonnet · custom prompt

greptile-apps · 2026-04-27T12:30:11Z

Greptile Summary

This PR switches RocksDB SST file ingestion in insert_accounts and insert_storages from copy-based (ingest_external_file) to move-based (ingest_external_file_opts with set_move_files(true)) semantics, reducing peak disk usage during snap sync by avoiding a second on-disk copy of leaf data.

The move is destructive: once ingest_external_file_opts succeeds, the source SST files no longer exist in the snapshot directory. If the process crashes or errors after ingest but before trie generation completes, any retry will find an empty snapshot directory and silently compute an incorrect trie. If snap sync always rebuilds the snapshot directory from scratch on restart this is benign, but that assumption is not enforced or documented in these functions.

Confidence Score: 3/5

Safe to merge only if snap sync is guaranteed to fully rebuild the snapshot directory on every restart; otherwise risks silent trie corruption on crash.

Two P1 findings (one per ingest site) sharing the same root cause: move semantics make the operation non-idempotent, so a crash between ingest and trie completion could leave snap sync in an unrecoverable state. Score is below the P1 ceiling because both affected code paths are on the critical sync route.

crates/networking/p2p/sync/snap_sync.rs — both insert_accounts and insert_storages ingest sites

Important Files Changed

Filename	Overview
crates/networking/p2p/sync/snap_sync.rs	Switches RocksDB SST ingest from copy to move semantics in both insert_accounts and insert_storages, reducing peak disk usage; introduces a crash-safety concern where the snapshot source files are irreversibly consumed before trie generation completes.

Sequence Diagram

sequenceDiagram
    participant SD as Snapshot Dir
    participant RDB as Temp RocksDB
    participant Trie as Trie Builder

    Note over SD,Trie: insert_accounts / insert_storages
    SD->>RDB: ingest_external_file_opts (move_files=true)
    Note over SD: SST files now GONE from snapshot dir
    RDB->>Trie: full_iterator / snapshot.raw_iterator
    Trie->>Trie: trie_from_sorted_accounts_wrap
    Note over SD,RDB: If crash here → snapshot dir empty on retry
    RDB-->>SD: drop(db)
    SD->>SD: remove_dir_all(snapshot_dir)
    RDB->>RDB: remove_dir_all(temp_db_dir)

Prompt To Fix All With AI

This is a comment left during a code review.
Path: crates/networking/p2p/sync/snap_sync.rs
Line: 1202-1205

Comment:
**Move loses idempotency on crash/retry**

With `set_move_files(true)`, the SST files are *destructively consumed* from the snapshot directory during ingest. If the process crashes or returns an error anywhere after the ingest call but before the function completes (e.g., a RocksDB iterator failure on line 1208, a trie-generation error on line 1217, or even an OOM), the snapshot directory will be empty on the next attempt and `ingest_external_file_opts` will ingest nothing — silently producing an empty/incorrect trie. With the previous copy-based ingest, the source SST files remained intact and any restart could re-ingest correctly.

If snap sync is designed to always restart from scratch (i.e., rebuilding the snapshot directory before this function is called), this is not a problem. But if partial restarts are possible, this move breaks crash safety.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/networking/p2p/sync/snap_sync.rs
Line: 1312-1315

Comment:
**Same issue for storage ingest crash safety**

Same crash-safety concern as in `insert_accounts`: once `ingest_external_file_opts` with `set_move_files(true)` succeeds, the SST files are gone from `account_storages_snapshots_dir`. Any failure between ingest and successful completion means the data is unrecoverable from the snapshot directory on retry.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "docs: add changelog entry for #6532" | Re-trigger Greptile}

greptile-apps · 2026-04-27T12:30:14Z

+    let mut ingest_opts = rocksdb::IngestExternalFileOptions::default();
+    ingest_opts.set_move_files(true);
+    db.ingest_external_file_opts(&ingest_opts, file_paths)
        .map_err(|err| SyncError::RocksDBError(err.into_string()))?;


Move loses idempotency on crash/retry

With set_move_files(true), the SST files are destructively consumed from the snapshot directory during ingest. If the process crashes or returns an error anywhere after the ingest call but before the function completes (e.g., a RocksDB iterator failure on line 1208, a trie-generation error on line 1217, or even an OOM), the snapshot directory will be empty on the next attempt and ingest_external_file_opts will ingest nothing — silently producing an empty/incorrect trie. With the previous copy-based ingest, the source SST files remained intact and any restart could re-ingest correctly.

If snap sync is designed to always restart from scratch (i.e., rebuilding the snapshot directory before this function is called), this is not a problem. But if partial restarts are possible, this move breaks crash safety.

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/networking/p2p/sync/snap_sync.rs Line: 1202-1205 Comment: **Move loses idempotency on crash/retry** With `set_move_files(true)`, the SST files are *destructively consumed* from the snapshot directory during ingest. If the process crashes or returns an error anywhere after the ingest call but before the function completes (e.g., a RocksDB iterator failure on line 1208, a trie-generation error on line 1217, or even an OOM), the snapshot directory will be empty on the next attempt and `ingest_external_file_opts` will ingest nothing — silently producing an empty/incorrect trie. With the previous copy-based ingest, the source SST files remained intact and any restart could re-ingest correctly. If snap sync is designed to always restart from scratch (i.e., rebuilding the snapshot directory before this function is called), this is not a problem. But if partial restarts are possible, this move breaks crash safety. How can I resolve this? If you propose a fix, please make it concise.

We do always restart from scratch, so not a problem.

greptile-apps · 2026-04-27T12:30:15Z

+    let mut ingest_opts = rocksdb::IngestExternalFileOptions::default();
+    ingest_opts.set_move_files(true);
+    db.ingest_external_file_opts(&ingest_opts, file_paths)
        .map_err(|err| SyncError::RocksDBError(err.into_string()))?;


Same issue for storage ingest crash safety

Same crash-safety concern as in insert_accounts: once ingest_external_file_opts with set_move_files(true) succeeds, the SST files are gone from account_storages_snapshots_dir. Any failure between ingest and successful completion means the data is unrecoverable from the snapshot directory on retry.

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/networking/p2p/sync/snap_sync.rs Line: 1312-1315 Comment: **Same issue for storage ingest crash safety** Same crash-safety concern as in `insert_accounts`: once `ingest_external_file_opts` with `set_move_files(true)` succeeds, the SST files are gone from `account_storages_snapshots_dir`. Any failure between ingest and successful completion means the data is unrecoverable from the snapshot directory on retry. How can I resolve this? If you propose a fix, please make it concise.

We do always restart from scratch, so not a problem.

github-actions · 2026-04-27T12:30:48Z

Lines of code report

Total lines added: 4
Total lines removed: 0
Total lines changed: 4

Detailed view

+------------------------------------------------+-------+------+
| File                                           | Lines | Diff |
+------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/snap_sync.rs | 1153  | +4   |
+------------------------------------------------+-------+------+

github-actions · 2026-04-27T12:32:39Z

🤖 Codex Code Review

No findings.

The change in snap_sync.rs and snap_sync.rs looks sound. set_move_files(true) only changes how temporary SST files are ingested into the ephemeral RocksDB, and the assumption behind it is supported by the path helpers in utils.rs and utils.rs: both snapshot dirs and temp DB dirs are siblings under the same datadir. I don’t see any EVM/consensus/gas/state-validation impact from this diff; it should just reduce peak disk usage during snap sync.

Testing gap: I couldn’t run a targeted cargo check here because rustup tried to write under /home/runner/.rustup and hit the sandbox’s read-only restriction. If CI hasn’t already, a RocksDB-enabled snap-sync path is the one thing still worth validating.

Automated review by OpenAI Codex · gpt-5.4 · custom prompt

github-actions · 2026-04-27T13:14:17Z

Benchmark Block Execution Results Comparison Against Main

Command	Mean [s]	Min [s]	Max [s]	Relative
`base`	65.749 ± 0.452	65.355	66.805	1.00
`head`	66.257 ± 0.361	65.577	66.648	1.01 ± 0.01

perf(p2p): move SST files into temp DB during snap sync ingest

36eabd7

iovoid requested a review from a team as a code owner April 27, 2026 12:28

github-actions Bot assigned iovoid Apr 27, 2026

github-actions Bot added L1 Ethereum client performance Block execution throughput and performance in general labels Apr 27, 2026

github-project-automation Bot added this to ethrex_performance and ethrex_l1 Apr 27, 2026

github-project-automation Bot moved this to Todo in ethrex_performance Apr 27, 2026

docs: add changelog entry for #6532

80166b0

greptile-apps Bot reviewed Apr 27, 2026

View reviewed changes

ilitteri approved these changes Apr 27, 2026

View reviewed changes

ElFantasma approved these changes Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(l1): move SST files into temp DB during snap sync ingest#6532

perf(l1): move SST files into temp DB during snap sync ingest#6532
iovoid wants to merge 2 commits intomainfrom
perf/snap-sync-move-files-ingest

iovoid commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

greptile-apps Bot commented Apr 27, 2026

Important Files Changed

Uh oh!

greptile-apps Bot Apr 27, 2026

Uh oh!

iovoid Apr 27, 2026

Uh oh!

greptile-apps Bot Apr 27, 2026

Uh oh!

iovoid Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

iovoid commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

🤖 Kimi Code Review

Uh oh!

github-actions Bot commented Apr 27, 2026

🤖 Claude Code Review

Review: perf(l1): move SST files into temp DB during snap sync ingest

Summary

Correctness & Safety Concerns

Minor Points

Verdict

Uh oh!

greptile-apps Bot commented Apr 27, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

iovoid Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

iovoid Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 27, 2026

Lines of code report

Uh oh!

github-actions Bot commented Apr 27, 2026

🤖 Codex Code Review

Uh oh!

github-actions Bot commented Apr 27, 2026

Benchmark Block Execution Results Comparison Against Main

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Review: `perf(l1): move SST files into temp DB during snap sync ingest`