perf(l1): move SST files into temp DB during snap sync ingest#6532
perf(l1): move SST files into temp DB during snap sync ingest#6532
Conversation
🤖 Kimi Code ReviewThe changes look correct and represent a sensible performance optimization for snap sync operations. Summary
Review Points
Minor Suggestion db.ingest_external_file_opts(&ingest_opts, file_paths)
.map_err(|err| {
// Log filesystem boundary violations for debugging
if err.to_string().contains("cross-device") {
warn!("SST ingest failed: snapshot and temp DB not on same filesystem");
}
SyncError::RocksDBError(err.into_string())
})?;Otherwise, LGTM - this is a straightforward disk space and I/O optimization. Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
🤖 Claude Code ReviewHere is my review of PR #6532: Review:
|
Greptile SummaryThis PR switches RocksDB SST file ingestion in
Confidence Score: 3/5Safe to merge only if snap sync is guaranteed to fully rebuild the snapshot directory on every restart; otherwise risks silent trie corruption on crash. Two P1 findings (one per ingest site) sharing the same root cause: move semantics make the operation non-idempotent, so a crash between ingest and trie completion could leave snap sync in an unrecoverable state. Score is below the P1 ceiling because both affected code paths are on the critical sync route. crates/networking/p2p/sync/snap_sync.rs — both insert_accounts and insert_storages ingest sites
|
| Filename | Overview |
|---|---|
| crates/networking/p2p/sync/snap_sync.rs | Switches RocksDB SST ingest from copy to move semantics in both insert_accounts and insert_storages, reducing peak disk usage; introduces a crash-safety concern where the snapshot source files are irreversibly consumed before trie generation completes. |
Sequence Diagram
sequenceDiagram
participant SD as Snapshot Dir
participant RDB as Temp RocksDB
participant Trie as Trie Builder
Note over SD,Trie: insert_accounts / insert_storages
SD->>RDB: ingest_external_file_opts (move_files=true)
Note over SD: SST files now GONE from snapshot dir
RDB->>Trie: full_iterator / snapshot.raw_iterator
Trie->>Trie: trie_from_sorted_accounts_wrap
Note over SD,RDB: If crash here → snapshot dir empty on retry
RDB-->>SD: drop(db)
SD->>SD: remove_dir_all(snapshot_dir)
RDB->>RDB: remove_dir_all(temp_db_dir)
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/networking/p2p/sync/snap_sync.rs
Line: 1202-1205
Comment:
**Move loses idempotency on crash/retry**
With `set_move_files(true)`, the SST files are *destructively consumed* from the snapshot directory during ingest. If the process crashes or returns an error anywhere after the ingest call but before the function completes (e.g., a RocksDB iterator failure on line 1208, a trie-generation error on line 1217, or even an OOM), the snapshot directory will be empty on the next attempt and `ingest_external_file_opts` will ingest nothing — silently producing an empty/incorrect trie. With the previous copy-based ingest, the source SST files remained intact and any restart could re-ingest correctly.
If snap sync is designed to always restart from scratch (i.e., rebuilding the snapshot directory before this function is called), this is not a problem. But if partial restarts are possible, this move breaks crash safety.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: crates/networking/p2p/sync/snap_sync.rs
Line: 1312-1315
Comment:
**Same issue for storage ingest crash safety**
Same crash-safety concern as in `insert_accounts`: once `ingest_external_file_opts` with `set_move_files(true)` succeeds, the SST files are gone from `account_storages_snapshots_dir`. Any failure between ingest and successful completion means the data is unrecoverable from the snapshot directory on retry.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "docs: add changelog entry for #6532" | Re-trigger Greptile
| let mut ingest_opts = rocksdb::IngestExternalFileOptions::default(); | ||
| ingest_opts.set_move_files(true); | ||
| db.ingest_external_file_opts(&ingest_opts, file_paths) | ||
| .map_err(|err| SyncError::RocksDBError(err.into_string()))?; |
There was a problem hiding this comment.
Move loses idempotency on crash/retry
With set_move_files(true), the SST files are destructively consumed from the snapshot directory during ingest. If the process crashes or returns an error anywhere after the ingest call but before the function completes (e.g., a RocksDB iterator failure on line 1208, a trie-generation error on line 1217, or even an OOM), the snapshot directory will be empty on the next attempt and ingest_external_file_opts will ingest nothing — silently producing an empty/incorrect trie. With the previous copy-based ingest, the source SST files remained intact and any restart could re-ingest correctly.
If snap sync is designed to always restart from scratch (i.e., rebuilding the snapshot directory before this function is called), this is not a problem. But if partial restarts are possible, this move breaks crash safety.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/sync/snap_sync.rs
Line: 1202-1205
Comment:
**Move loses idempotency on crash/retry**
With `set_move_files(true)`, the SST files are *destructively consumed* from the snapshot directory during ingest. If the process crashes or returns an error anywhere after the ingest call but before the function completes (e.g., a RocksDB iterator failure on line 1208, a trie-generation error on line 1217, or even an OOM), the snapshot directory will be empty on the next attempt and `ingest_external_file_opts` will ingest nothing — silently producing an empty/incorrect trie. With the previous copy-based ingest, the source SST files remained intact and any restart could re-ingest correctly.
If snap sync is designed to always restart from scratch (i.e., rebuilding the snapshot directory before this function is called), this is not a problem. But if partial restarts are possible, this move breaks crash safety.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
We do always restart from scratch, so not a problem.
| let mut ingest_opts = rocksdb::IngestExternalFileOptions::default(); | ||
| ingest_opts.set_move_files(true); | ||
| db.ingest_external_file_opts(&ingest_opts, file_paths) | ||
| .map_err(|err| SyncError::RocksDBError(err.into_string()))?; |
There was a problem hiding this comment.
Same issue for storage ingest crash safety
Same crash-safety concern as in insert_accounts: once ingest_external_file_opts with set_move_files(true) succeeds, the SST files are gone from account_storages_snapshots_dir. Any failure between ingest and successful completion means the data is unrecoverable from the snapshot directory on retry.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/sync/snap_sync.rs
Line: 1312-1315
Comment:
**Same issue for storage ingest crash safety**
Same crash-safety concern as in `insert_accounts`: once `ingest_external_file_opts` with `set_move_files(true)` succeeds, the SST files are gone from `account_storages_snapshots_dir`. Any failure between ingest and successful completion means the data is unrecoverable from the snapshot directory on retry.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
We do always restart from scratch, so not a problem.
Lines of code reportTotal lines added: Detailed view |
🤖 Codex Code ReviewNo findings. The change in snap_sync.rs and snap_sync.rs looks sound. Testing gap: I couldn’t run a targeted Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
Benchmark Block Execution Results Comparison Against Main
|
Motivation
During storage insertion in snapsync, we use substantially more disk.
Description
Uses RocksDB move-based ingestion to reduce peak usage.