fix(l1): pre-check sync head height to skip snap on low-block networks by edg-l · Pull Request #6536 · lambdaclass/ethrex

edg-l · 2026-04-27T14:21:29Z

Summary

When joining a fresh network whose head is only a few blocks deep, snap sync stalls in the header download loop before ever reaching the existing head_close_to_0 fallback in sync_cycle_snap (peers are barely synced themselves and request_block_headers returns no usable batch).
Adds an upfront probe in Syncer::sync_cycle: fetch the sync head's header by hash; if header.number < MIN_FULL_BLOCKS, flip snap_enabled to false and dispatch directly to sync_cycle_full.
If the probe can't reach a peer after 3 attempts (2s apart), falls through to the existing snap path unchanged — no behavior regression on healthy mainnet/testnet.

Test plan

Bring up a fresh local devnet of multiple ethrex nodes; verify a newly started node skips snap and full-syncs from genesis without stalling.
Confirm normal mainnet/testnet snap sync is unaffected (probe returns >= MIN_FULL_BLOCKS, snap path runs as before).
Confirm probe-failure case (no peers): snap path runs as before, no immediate fallback to full.

When joining a fresh network whose head is only a few blocks deep, snap sync would stall in the header download loop before reaching the existing `head_close_to_0` fallback. Probe the sync head's header up-front; if its number is below MIN_FULL_BLOCKS, switch to full sync immediately. Falls through to the existing snap path if the probe can't reach any peer.

github-actions · 2026-04-27T14:23:18Z

🤖 Kimi Code Review

This PR adds a pre-check to avoid snap sync on fresh devnets where the chain head is below MIN_FULL_BLOCKS. The implementation is correct and handles the edge case properly.

Highlights

Correct fallback semantics: Returning None from probe_sync_head_number when peers don't respond (lines 243-245, 269) correctly falls through to snap sync, delegating to the existing (potentially slower) in-loop guard rather than stalling indefinitely.
Bounded retry logic: The 3-attempt limit with 2-second delays (lines 236-237) prevents indefinite hangs while tolerating transient network issues.
Proper structured logging: Use of info! with field shorthand (line 218) and appropriate log levels (debug for probes, warn for errors).

Minor suggestions

Line 256-257: Consider if request_block_headers_from_hash could be replaced with a single-header request API (if available) to avoid iterating through a batch response, though the current find logic is correct.

Line 262, 265: For consistency with line 218, consider using structured fields instead of inline formatting:

debug!(
    attempt,
    max_attempts = PROBE_SYNC_HEAD_ATTEMPTS,
    "Sync head probe: no peer response"
);

Documentation: The comment on line 207 references head_close_to_0—ensure this internal detail remains synchronized if the snap sync module is refactored later.

Security/Performance: No concerns. The probe adds minimal latency (max 6s) and the early full-sync return avoids expensive snap sync setup for short chains.

LGTM.

Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

greptile-apps · 2026-04-27T14:24:31Z

Greptile Summary

This PR adds an upfront probe in Syncer::sync_cycle that fetches the sync head's block header by hash before committing to snap sync. If the block number is below MIN_FULL_BLOCKS (10 000), snap_enabled is permanently set to false and the node dispatches directly to full::sync_cycle_full, bypassing the stall that previously occurred when peers on a fresh devnet couldn't serve a full header batch. The probe retries up to 3 times (2 s apart) and falls through to the original snap path on failure, so there is no regression on healthy mainnet/testnet nodes.

Confidence Score: 4/5

Safe to merge — fix is correct and consistent with existing snap/full transition logic; only P2 style notes remain.

No P0 or P1 issues found. The probe logic is sound: the header hash is cryptographically bound so a single-peer response is trustworthy, single-element header responses pass the chaining check (vacuous windows(2).all), and the fallback on probe failure is conservative. The two P2 comments are minor — one is a missing attempt number in a debug log, the other is a documentation gap around the permanent nature of the snap-disable decision.

No files require special attention.

Important Files Changed

Filename	Overview
crates/networking/p2p/sync.rs	Adds a pre-check probe before snap sync that fetches the sync head header by hash; if block number < MIN_FULL_BLOCKS it permanently disables snap and dispatches directly to full sync. Logic is correct and consistent with existing snap_sync.rs behaviour; two minor style/documentation P2s noted.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[sync_cycle called] --> B{snap_enabled?}
    B -- No --> F[full::sync_cycle_full]
    B -- Yes --> C[probe_sync_head_number\nup to 3 attempts]
    C -- None / probe failed --> E[snap::sync_cycle_snap]
    C -- Some number --> D{number < MIN_FULL_BLOCKS\n10 000}
    D -- No --> E
    D -- Yes --> G[snap_enabled = false\nfull::sync_cycle_full]
    E --> H{head_close_to_0\nor sync_head found?}
    H -- Yes --> I[snap_enabled = false\nfull::sync_cycle_full]
    H -- No --> J[continue snap sync loop]

Prompt To Fix All With AI

This is a comment left during a code review.
Path: crates/networking/p2p/sync.rs
Line: 274-278

Comment:
**Missing attempt number in debug log**

When `Ok(Some(headers))` is returned but the target header isn't found, the debug message doesn't include the attempt number, unlike the other two branches. In a scenario where all 3 attempts hit this case (e.g. a peer consistently returns a response anchored to a different fork root), the three identical log lines with no attempt context make it harder to diagnose.

```suggestion
                debug!("Sync head probe attempt {attempt}/{PROBE_SYNC_HEAD_ATTEMPTS}: response did not contain target header");
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/networking/p2p/sync.rs
Line: 219

Comment:
**Permanent snap-disable on first eligible sync_head**

`snap_enabled` is set to `false` here before dispatching to full sync, and it is never reset to `true`. If the node is started while the devnet chain is short (< `MIN_FULL_BLOCKS`) but then the network advances past that threshold before the first full sync cycle completes, subsequent `sync_cycle` calls will still skip the snap path entirely.

This matches the existing behaviour in `snap_sync.rs` (line 225 sets the same flag permanently on `head_close_to_0`), so it is consistent — but worth documenting explicitly here since the pre-check fires much earlier and at a point where the "permanent" decision may be harder to notice.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix(l1): pre-check sync head height to s..." | Re-trigger Greptile}

greptile-apps · 2026-04-27T14:24:35Z

+            Ok(Some(headers)) => {
+                if let Some(header) = headers.iter().find(|h| h.hash() == sync_head) {
+                    return Some(header.number);
+                }
+                debug!("Sync head probe: response did not contain target header");


Missing attempt number in debug log

When Ok(Some(headers)) is returned but the target header isn't found, the debug message doesn't include the attempt number, unlike the other two branches. In a scenario where all 3 attempts hit this case (e.g. a peer consistently returns a response anchored to a different fork root), the three identical log lines with no attempt context make it harder to diagnose.

Suggested change

Ok(Some(headers)) => {

if let Some(header) = headers.iter().find(|h| h.hash() == sync_head) {

return Some(header.number);

}

debug!("Sync head probe: response did not contain target header");

debug!("Sync head probe attempt {attempt}/{PROBE_SYNC_HEAD_ATTEMPTS}: response did not contain target header");

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/networking/p2p/sync.rs Line: 274-278 Comment: **Missing attempt number in debug log** When `Ok(Some(headers))` is returned but the target header isn't found, the debug message doesn't include the attempt number, unlike the other two branches. In a scenario where all 3 attempts hit this case (e.g. a peer consistently returns a response anchored to a different fork root), the three identical log lines with no attempt context make it harder to diagnose. ```suggestion debug!("Sync head probe attempt {attempt}/{PROBE_SYNC_HEAD_ATTEMPTS}: response did not contain target header"); ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-27T14:24:37Z

+                    sync_head_number,
+                    "Sync head below MIN_FULL_BLOCKS ({MIN_FULL_BLOCKS}), using full sync"
+                );
+                self.snap_enabled.store(false, Ordering::Relaxed);


Permanent snap-disable on first eligible sync_head

snap_enabled is set to false here before dispatching to full sync, and it is never reset to true. If the node is started while the devnet chain is short (< MIN_FULL_BLOCKS) but then the network advances past that threshold before the first full sync cycle completes, subsequent sync_cycle calls will still skip the snap path entirely.

This matches the existing behaviour in snap_sync.rs (line 225 sets the same flag permanently on head_close_to_0), so it is consistent — but worth documenting explicitly here since the pre-check fires much earlier and at a point where the "permanent" decision may be harder to notice.

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/networking/p2p/sync.rs Line: 219 Comment: **Permanent snap-disable on first eligible sync_head** `snap_enabled` is set to `false` here before dispatching to full sync, and it is never reset to `true`. If the node is started while the devnet chain is short (< `MIN_FULL_BLOCKS`) but then the network advances past that threshold before the first full sync cycle completes, subsequent `sync_cycle` calls will still skip the snap path entirely. This matches the existing behaviour in `snap_sync.rs` (line 225 sets the same flag permanently on `head_close_to_0`), so it is consistent — but worth documenting explicitly here since the pre-check fires much earlier and at a point where the "permanent" decision may be harder to notice. How can I resolve this? If you propose a fix, please make it concise.

github-actions · 2026-04-27T14:24:50Z

Lines of code report

Total lines added: 46
Total lines removed: 0
Total lines changed: 46

Detailed view

+--------------------------------------+-------+------+
| File                                 | Lines | Diff |
+--------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync.rs | 358   | +46  |
+--------------------------------------+-------+------+

github-actions · 2026-04-27T14:25:03Z

🤖 Claude Code Review

PR #6536 Review: `fix(l1): pre-check sync head height to skip snap on low-block networks`

Overall: The fix is well-motivated and targets a real stall condition. The implementation is mostly correct. A few things worth addressing.

Correctness

Permanent snap_enabled = false on short chains

sync.rs:219: self.snap_enabled.store(false, Ordering::Relaxed) is a permanent write to a shared Arc<AtomicBool>. There is no re-enable path — once this fires, every subsequent sync_cycle call goes to the full-sync branch for the lifetime of the Syncer. For a devnet that stays short this is fine, but if a node is started while the chain is only a few blocks deep and the network later grows past MIN_FULL_BLOCKS, it stays locked in full sync. This matches the behavior of the existing head_close_to_0 guard in snap_sync.rs, so it's consistent — but the PR description should mention this as intentional.

find() iterates and hashes all returned headers

sync.rs:274:

if let Some(header) = headers.iter().find(|h| h.hash() == sync_head) {

h.hash() runs a keccak256 per element. Since the request is made with NewToOld ordering starting from sync_head, the first header in the response should be sync_head. Iterating the whole batch (which may be hundreds of headers by default) and hashing each one wastes CPU. Prefer:

if let Some(header) = headers.first().filter(|h| h.hash() == sync_head) {
    return Some(header.number);
}

This keeps the security check (validates peer didn't send a wrong chain) while touching only one hash.

Full batch request for a single-header probe

request_block_headers_from_hash sends a standard GetBlockHeaders RLP message, which returns a full batch of headers (potentially up to the peer's limit). The probe only needs one header. If PeerHandler doesn't expose a single-header variant, this is acceptable as-is, but it's worth a comment explaining why a batch function is used for a probe.

Rust Idioms

if let chain syntax (MSRV)

sync.rs:209:

if let Some(sync_head_number) = probe_sync_head_number(&mut self.peers, sync_head).await
    && sync_head_number < MIN_FULL_BLOCKS

let chains (if let … && condition) were stabilized in Rust 1.88.0 (May 2025). Confirm the project's MSRV is ≥ 1.88, or there will be a compiler error on older toolchains.

Minor Issues

Sleep on last failed attempt is correctly avoided — if attempt < PROBE_SYNC_HEAD_ATTEMPTS is handled. Good.

warn! vs debug! log level on probe failure — the Err branch uses warn!, which is appropriate for unexpected network errors. The Ok(None) branch uses debug!, which is correct for an expected "no peer available" case. Consistent.

METRICS.enable() not called before early full-sync dispatch — sync.rs:218–225: When the probe triggers the full-sync path, METRICS.enable() (called at sync.rs:222 in the normal snap path) is skipped. The existing head_close_to_0 guard in snap_sync.rs calls METRICS.disable() before handing off, implying metrics was already enabled at that point. Here metrics was never enabled, so no disable is needed — but verify that sync_cycle_full doesn't assume metrics are in a particular state.

No automated tests — the test plan is entirely manual. Given the async/network nature this is understandable, but a unit test that mocks PeerHandler to verify the pre-check logic (below threshold → full sync, above threshold → snap, no response → snap) would catch regressions.

Summary

The fix is correct in its primary goal and the fallback behavior (no-peer → snap path unchanged) is safe. Address the find()-vs-first() hashing inefficiency and confirm the MSRV supports let chains. Everything else is minor polish.

Automated review by Claude (Anthropic) · sonnet · custom prompt

github-actions · 2026-04-27T14:27:53Z

🤖 Codex Code Review

Findings

sync.rs:219: this new early-return path disables snap and jumps straight into full sync without clearing any persisted snap checkpoint. SyncManager keeps looping while get_header_download_checkpoint() is still set (sync_manager.rs:223), and the only cleanup hook for that state is clear_snap_state() (store.rs:834). If this branch is hit while resuming a partial snap sync, the node can keep re-entering full sync with a stale checkpoint and never satisfy the manager’s exit condition. This branch should clear snap state before returning, or the manager loop should also stop once snap has been disabled.
sync.rs:268: probe_sync_head_number uses request_block_headers_from_hash(), which sends GetBlockHeaders with limit: BLOCK_HEADER_LIMIT (peer_handler.rs:437, blocks.rs:84) and retries it 3 times. With PEER_REPLY_TIMEOUT = 5s, this can add about 19s before the existing snap-sync retry loop even starts, and it does so just to learn one block number. There is already a single-header path in peer_handler.rs:51; reusing/exposing that would avoid the extra payload and peer-scoring churn on every snap attempt.

Otherwise the change is small and I don’t see EVM, gas-accounting, trie, or consensus-rule issues in the touched logic.

I couldn’t run cargo check here because the sandboxed environment blocks rustup from writing under /home/runner/.rustup.

Automated review by OpenAI Codex · gpt-5.4 · custom prompt

edg-l requested a review from a team as a code owner April 27, 2026 14:21

github-actions Bot assigned edg-l Apr 27, 2026

github-actions Bot added the L1 Ethereum client label Apr 27, 2026

github-project-automation Bot added this to ethrex_l1 Apr 27, 2026

greptile-apps Bot reviewed Apr 27, 2026

View reviewed changes

iovoid approved these changes Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(l1): pre-check sync head height to skip snap on low-block networks#6536

fix(l1): pre-check sync head height to skip snap on low-block networks#6536
edg-l wants to merge 1 commit intomainfrom
fix/snap-sync-low-block-network

edg-l commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

greptile-apps Bot commented Apr 27, 2026

Important Files Changed

Uh oh!

greptile-apps Bot Apr 27, 2026

Uh oh!

greptile-apps Bot Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

edg-l commented Apr 27, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented Apr 27, 2026

🤖 Kimi Code Review

Uh oh!

greptile-apps Bot commented Apr 27, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 27, 2026

Lines of code report

Uh oh!

github-actions Bot commented Apr 27, 2026

🤖 Claude Code Review

PR #6536 Review: fix(l1): pre-check sync head height to skip snap on low-block networks

Correctness

Rust Idioms

Minor Issues

Summary

Uh oh!

github-actions Bot commented Apr 27, 2026

🤖 Codex Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PR #6536 Review: `fix(l1): pre-check sync head height to skip snap on low-block networks`