feat(state-prune): support tail replay on existing output by jolestar · Pull Request #3991 · rooch-network/rooch

jolestar · 2026-04-11T04:12:45Z

Summary

add tail replay support for an existing replay output directory and expose a finalize-replay-output flow
allow replay to skip the expensive state-node compaction step when requested by the operator
add indexed L2 transaction import helpers needed to backfill replay outputs without initializing the target indexer

Included changes

IncrementalReplayer can now finalize an existing output store without reopening the same RocksDB handle in-place
db state-prune adds the existing-output tail replay/finalize path
replay command wiring adds an option to skip replay compaction for long-running repair/recovery flows
db import-indexed-transactions imports missing indexed L2 transactions and avoids initializing the target indexer on the destination DB
add/update design docs for mainnet replay CF-copy redesign and tail replay design

Testing

Not run in this session

vercel · 2026-04-11T04:12:46Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
rooch-portal-v2.1	Ready	Preview, Comment	Apr 14, 2026 2:28pm
test-portal	Ready	Preview, Comment	Apr 14, 2026 2:28pm

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
rooch	Ignored	Preview	Apr 14, 2026 2:28pm

github-actions · 2026-04-11T04:13:04Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 98f873b.

Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

.github/workflows/cross_platform_check.yml

Copilot

Pull request overview

Adds operational tooling and plumbing for mainnet repair/recovery flows by enabling (1) tail replay against an existing replay output directory, (2) a finalize-only path for partially completed outputs, (3) optional skipping of final state-node compaction, and (4) a helper to backfill missing indexed L2 transaction history into a destination DB without initializing the destination indexer.

Changes:

Extend IncrementalReplayer with tail_replay_existing_output(...) and finalize_existing_output(...), plus a skip_final_compact replay option.
Wire new db state-prune tail-replay / finalize-replay-output CLI flows and add db import-indexed-transactions.
Add design docs describing tail replay and a selective-CF-copy replay redesign.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
docs/dev-guide/mainnet_tail_replay_design_20260408.md	Tail replay design doc and proposed CLI/steps.
docs/dev-guide/mainnet_replay_cf_copy_redesign_20260407.md	Design doc for reducing replay output size via selective CF copy.
crates/rooch/src/commands/db/mod.rs	Registers the new `ImportIndexedTransactions` DB subcommand.
crates/rooch/src/commands/db/commands/state_prune/replay.rs	Adds `TailReplayCommand` and `FinalizeReplayOutputCommand`; exposes `skip_final_compact`.
crates/rooch/src/commands/db/commands/state_prune/command.rs	Adds new state-prune subcommands to the CLI enum dispatch.
crates/rooch/src/commands/db/commands/mod.rs	Exposes the new `import_indexed_transactions` command module.
crates/rooch/src/commands/db/commands/import_indexed_transactions.rs	Implements import/backfill of missing L2 tx + execution info referenced by the target indexer.
crates/rooch/Cargo.toml	Adds `diesel` dependency needed by the new import command.
crates/rooch-pruner/src/state_prune/incremental_replayer.rs	Implements tail replay + finalize flows, accumulator tail rebuild, and skip-final-compact support.
crates/rooch-pruner/Cargo.toml	Adds `accumulator` dependency for tail accumulator rebuild.
crates/rooch-config/src/state_prune.rs	Adds `skip_final_compact` to `ReplayConfig`.
Cargo.lock	Locks new dependencies (`diesel`, `accumulator`) into the build.

Copilot · 2026-04-11T04:19:09Z

+- [replay.rs](/Users/jolestar/opensource/src/github.com/rooch-network/rooch/crates/rooch/src/commands/db/commands/state_prune/replay.rs)
+- [incremental_replayer.rs](/Users/jolestar/opensource/src/github.com/rooch-network/rooch/crates/rooch-pruner/src/state_prune/incremental_replayer.rs)
+


These markdown links point to an absolute local filesystem path (/Users/...). That won’t work for other developers or in GitHub; please switch to repo-relative links (e.g., crates/rooch/src/... or a GitHub permalink).

Copilot · 2026-04-11T04:19:09Z

+        let mut metadata = StatePruneMetadata::new(
+            crate::state_prune::OperationType::Replay {
+                snapshot_path: PathBuf::new(),
+                from_order: from_order.unwrap_or(0),
+                to_order,
+                output_dir: output_dir.to_path_buf(),
+            },
+            serde_json::json!({
+                "mode": "tail_replay_existing_output",
+                "from_order": from_order,
+                "to_order": to_order,
+                "output_dir": output_dir,
+                "config": self.config
+            }),
+        );
+
+        let source_moveos_store = self.source_moveos_store()?;
+        let (output_store, output_rooch_store) = self.load_output_stores(output_dir)?;
+        let output_startup_info = output_store
+            .get_config_store()
+            .get_startup_info()?
+            .ok_or_else(|| anyhow::anyhow!("No startup info found in existing replay output"))?;
+        let output_sequencer_info = output_rooch_store
+            .get_meta_store()
+            .get_sequencer_info()?
+            .ok_or_else(|| anyhow::anyhow!("No sequencer info found in existing replay output"))?;
+
+        let resolved_from_order = from_order.unwrap_or(output_sequencer_info.last_order + 1);


StatePruneMetadata::new is initialized with from_order: from_order.unwrap_or(0), but the actual start order is resolved later from the existing output’s sequencer_info. This makes the recorded operation type/metadata inaccurate (and can produce misleading filenames/diagnostics). Consider resolving from_order first (or updating metadata.operation_type) so it records the effective resolved_from_order.

Suggested change

let mut metadata = StatePruneMetadata::new(

crate::state_prune::OperationType::Replay {

snapshot_path: PathBuf::new(),

from_order: from_order.unwrap_or(0),

to_order,

output_dir: output_dir.to_path_buf(),

},

serde_json::json!({

"mode": "tail_replay_existing_output",

"from_order": from_order,

"to_order": to_order,

"output_dir": output_dir,

"config": self.config

}),

);

let source_moveos_store = self.source_moveos_store()?;

let (output_store, output_rooch_store) = self.load_output_stores(output_dir)?;

let output_startup_info = output_store

.get_config_store()

.get_startup_info()?

.ok_or_else(|| anyhow::anyhow!("No startup info found in existing replay output"))?;

let output_sequencer_info = output_rooch_store

.get_meta_store()

.get_sequencer_info()?

.ok_or_else(|| anyhow::anyhow!("No sequencer info found in existing replay output"))?;

let resolved_from_order = from_order.unwrap_or(output_sequencer_info.last_order + 1);

let source_moveos_store = self.source_moveos_store()?;

let (output_store, output_rooch_store) = self.load_output_stores(output_dir)?;

let output_startup_info = output_store

.get_config_store()

.get_startup_info()?

.ok_or_else(|| anyhow::anyhow!("No startup info found in existing replay output"))?;

let output_sequencer_info = output_rooch_store

.get_meta_store()

.get_sequencer_info()?

.ok_or_else(|| anyhow::anyhow!("No sequencer info found in existing replay output"))?;

let resolved_from_order = from_order.unwrap_or(output_sequencer_info.last_order + 1);

let mut metadata = StatePruneMetadata::new(

crate::state_prune::OperationType::Replay {

snapshot_path: PathBuf::new(),

from_order: resolved_from_order,

to_order,

output_dir: output_dir.to_path_buf(),

},

serde_json::json!({

"mode": "tail_replay_existing_output",

"from_order": from_order,

"resolved_from_order": resolved_from_order,

"to_order": to_order,

"output_dir": output_dir,

"config": self.config

}),

);

Copilot · 2026-04-11T04:19:09Z

+        metadata.mark_in_progress(
+            format!(
+                "Loading canonical tail entries [{}..={}]",
+                resolved_from_order, to_order
+            ),
+            20.0,
+        );
+        let tail_entries = self.load_tail_entries(
+            &source_moveos_store,
+            resolved_from_order,
+            to_order,
+            &mut report,
+        )?;
+        self.progress_tracker.set_total(tail_entries.len() as u64);
+
+        metadata.mark_in_progress("Applying tail changesets".to_string(), 40.0);
+        let (actual_state_root, expected_state_root, expected_global_size) = self
+            .tail_replay_entries_batched(
+                tail_entries,
+                &output_store,
+                &output_rooch_store,
+                output_startup_info.state_root,
+                output_startup_info.size,
+                output_sequencer_info.last_accumulator_info.clone(),
+                &mut report,
+                &mut metadata,
+            )
+            .await?;
+


load_tail_entries materializes the entire [resolved_from_order..=to_order] range into memory before any apply happens. For large tail ranges, this can cause high peak memory usage and longer time-to-first-progress. Consider processing in streaming/chunked fashion (load a batch of changesets + tx metadata, apply, then drop) instead of building a single Vec<TailReplayEntry>.

Suggested change

metadata.mark_in_progress(

format!(

"Loading canonical tail entries [{}..={}]",

resolved_from_order, to_order

),

20.0,

);

let tail_entries = self.load_tail_entries(

&source_moveos_store,

resolved_from_order,

to_order,

&mut report,

)?;

self.progress_tracker.set_total(tail_entries.len() as u64);

metadata.mark_in_progress("Applying tail changesets".to_string(), 40.0);

let (actual_state_root, expected_state_root, expected_global_size) = self

.tail_replay_entries_batched(

tail_entries,

&output_store,

&output_rooch_store,

output_startup_info.state_root,

output_startup_info.size,

output_sequencer_info.last_accumulator_info.clone(),

&mut report,

&mut metadata,

)

.await?;

let total_tail_entries = to_order - resolved_from_order + 1;

let tail_chunk_size: u64 = 1_000;

self.progress_tracker.set_total(total_tail_entries);

metadata.mark_in_progress("Applying tail changesets".to_string(), 40.0);

let mut current_from_order = resolved_from_order;

let mut current_state_root = output_startup_info.state_root;

let mut current_global_size = output_startup_info.size;

let mut current_accumulator_info = output_sequencer_info.last_accumulator_info.clone();

let mut actual_state_root = current_state_root;

let mut expected_state_root = current_state_root;

let mut expected_global_size = current_global_size;

while current_from_order <= to_order {

let current_to_order =

std::cmp::min(current_from_order + tail_chunk_size - 1, to_order);

metadata.mark_in_progress(

format!(

"Loading and applying canonical tail entries [{}..={}]",

current_from_order, current_to_order

),

40.0,

);

let tail_entries = self.load_tail_entries(

&source_moveos_store,

current_from_order,

current_to_order,

&mut report,

)?;

(actual_state_root, expected_state_root, expected_global_size) = self

.tail_replay_entries_batched(

tail_entries,

&output_store,

&output_rooch_store,

current_state_root,

current_global_size,

current_accumulator_info.clone(),

&mut report,

&mut metadata,

)

.await?;

current_state_root = actual_state_root;

current_global_size = expected_global_size;

current_accumulator_info = output_rooch_store

.get_meta_store()

.get_sequencer_info()?

.ok_or_else(|| {

anyhow::anyhow!("No sequencer info found after tail replay chunk")

})?

.last_accumulator_info;

current_from_order = current_to_order + 1;

}

Copilot · 2026-04-11T04:19:09Z

+        let mut entries = Vec::with_capacity(changesets.len());
+
+        for (tx_order, changeset_ext) in changesets {
+            let tx_hash = self
+                .rooch_store
+                .transaction_store
+                .get_tx_hashes(vec![tx_order])?
+                .pop()
+                .flatten()


Inside the loop, get_tx_hashes(vec![tx_order]) issues a separate DB query per order (N+1) and allocates a Vec each time. Since you already know the full order range, consider fetching tx hashes in bulk (e.g., one get_tx_hashes call for the whole range or per-batch) and then multi-get transactions / execution infos to reduce IO overhead.

Suggested change

let mut entries = Vec::with_capacity(changesets.len());

for (tx_order, changeset_ext) in changesets {

let tx_hash = self

.rooch_store

.transaction_store

.get_tx_hashes(vec![tx_order])?

.pop()

.flatten()

let tx_orders: Vec<u64> = changesets

.iter()

.map(|(tx_order, _)| *tx_order)

.collect();

let tx_hashes = self.rooch_store.transaction_store.get_tx_hashes(tx_orders)?;

if tx_hashes.len() != changesets.len() {

return Err(anyhow::anyhow!(

"Mismatched tx hash count for range {}..={}: expected {}, got {}",

from_order,

to_order,

changesets.len(),

tx_hashes.len()

));

}

let mut entries = Vec::with_capacity(changesets.len());

for ((tx_order, changeset_ext), tx_hash_opt) in changesets.into_iter().zip(tx_hashes.into_iter())

{

let tx_hash = tx_hash_opt

Copilot · 2026-04-11T04:19:10Z

+    let tx_db_path = target_indexer_dir.join("transactions");
+    let tx_db_url = tx_db_path
+        .to_str()
+        .ok_or_else(|| anyhow::anyhow!("Invalid target transactions indexer path"))?;
+    let mut conn = SqliteConnection::establish(tx_db_url)


load_indexed_transaction_batch opens a new SqliteConnection on every call, and the import loop calls this once per batch. Reusing a single connection (or using Diesel’s existing connection pool helper like the indexer does) would avoid repeated connection setup and improve import throughput for large backfills.

jolestar added 5 commits April 8, 2026 17:05

fix(replay): finalize existing output without reopening db

5ea7828

feat(state-prune): add tail replay for existing output

22478de

feat(state-prune): allow skipping replay compact

b2c6d22

feat(db): import missing indexed l2 transactions

b80beac

fix(db): avoid target indexer init in import

784fc34

Copilot AI review requested due to automatic review settings April 11, 2026 04:12

jolestar requested a review from Mine77 as a code owner April 11, 2026 04:12

Copilot started reviewing on behalf of jolestar April 11, 2026 04:13 View session

vercel Bot deployed to Preview – rooch-portal-v2.1 April 11, 2026 04:14 View deployment

vercel Bot deployed to Preview – test-portal April 11, 2026 04:15 View deployment

Copilot AI reviewed Apr 11, 2026

View reviewed changes

fix(state-prune): address replay review feedback

b4f4886

vercel Bot deployed to Preview – rooch-portal-v2.1 April 11, 2026 15:38 View deployment

vercel Bot deployed to Preview – test-portal April 11, 2026 15:40 View deployment

Avoid crates.io installs for CI cargo tools

40e572b

vercel Bot deployed to Preview – test-portal April 11, 2026 16:28 View deployment

vercel Bot deployed to Preview – rooch-portal-v2.1 April 11, 2026 16:29 View deployment

Reuse prebaked Rust and accept downloaded artifacts

115e96f

vercel Bot deployed to Preview – rooch-portal-v2.1 April 12, 2026 10:10 View deployment

vercel Bot deployed to Preview – test-portal April 12, 2026 10:12 View deployment

Reduce self-hosted Rust bootstrap and validation load

6215311

vercel Bot deployed to Preview – rooch-portal-v2.1 April 12, 2026 10:44 View deployment

vercel Bot deployed to Preview – test-portal April 12, 2026 10:45 View deployment

Proxy rust toolchain downloads on self-hosted

91c322b

vercel Bot deployed to Preview – test-portal April 14, 2026 09:25 View deployment

vercel Bot deployed to Preview – rooch-portal-v2.1 April 14, 2026 09:27 View deployment

Stabilize self-hosted Rust jobs

98f873b

vercel Bot deployed to Preview – test-portal April 14, 2026 14:26 View deployment

vercel Bot deployed to Preview – rooch-portal-v2.1 April 14, 2026 14:28 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(state-prune): support tail replay on existing output#3991

feat(state-prune): support tail replay on existing output#3991
jolestar wants to merge 11 commits into
mainfrom
feat/state-prune-tail-replay

jolestar commented Apr 11, 2026

Uh oh!

vercel Bot commented Apr 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		- [replay.rs](/Users/jolestar/opensource/src/github.com/rooch-network/rooch/crates/rooch/src/commands/db/commands/state_prune/replay.rs)
		- [incremental_replayer.rs](/Users/jolestar/opensource/src/github.com/rooch-network/rooch/crates/rooch-pruner/src/state_prune/incremental_replayer.rs)

-        let mut entries = Vec::with_capacity(changesets.len());
-        for (tx_order, changeset_ext) in changesets {
-            let tx_hash = self
-                .rooch_store
-                .transaction_store
-                .get_tx_hashes(vec![tx_order])?
-                .pop()
-                .flatten()
+        let tx_orders: Vec<u64> = changesets
+            .iter()
+            .map(|(tx_order, _)| *tx_order)
+            .collect();
+        let tx_hashes = self.rooch_store.transaction_store.get_tx_hashes(tx_orders)?;
+        if tx_hashes.len() != changesets.len() {
+            return Err(anyhow::anyhow!(
+                "Mismatched tx hash count for range {}..={}: expected {}, got {}",
+                from_order,
+                to_order,
+                changesets.len(),
+                tx_hashes.len()
+            ));
+        }
+        let mut entries = Vec::with_capacity(changesets.len());
+        for ((tx_order, changeset_ext), tx_hash_opt) in changesets.into_iter().zip(tx_hashes.into_iter())
+        {
+            let tx_hash = tx_hash_opt

Conversation

jolestar commented Apr 11, 2026

Summary

Included changes

Testing

Uh oh!

vercel Bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Snapshot Warnings

Scanned Files

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented Apr 11, 2026 •

edited

Loading

github-actions Bot commented Apr 11, 2026 •

edited

Loading