Skip to content

fix(network): clear ENR nfd field when no next fork is scheduled during runtime transitions#9131

Open
Alleysira wants to merge 4 commits intosigp:unstablefrom
Alleysira:fix/nfd-non-zero-after-fork-trans
Open

fix(network): clear ENR nfd field when no next fork is scheduled during runtime transitions#9131
Alleysira wants to merge 4 commits intosigp:unstablefrom
Alleysira:fix/nfd-non-zero-after-fork-trans

Conversation

@Alleysira
Copy link
Copy Markdown

Issue Addressed

No. But related to #9009 and #8996

Motivation

Hi there,

In PR #9009, we fixed the ENR nfd initialization path to use zero-valued bytes when no next fork is scheduled. However, the same bug pattern remains in the runtime fork transition path (update_next_fork_digest). After the final scheduled fork activates, the nfd ENR field is set to the current fork digest instead of being cleared to 0x00000000.

This is the testing results from a kurtosis testnet. With Fulu at epoch 0 and Gloas as the final fork at epoch 1, after Gloas activates, only lighthouse has wrong nfd:

| Client     | fork_digest | nfd        | Correct? |
|------------|-------------|------------|----------|
| Lighthouse | `7f11a203`  | `7f11a203` | NO       |
| Teku       | `7f11a203`  | `00000000` | YES      |
| Nimbus     | `7f11a203`  | `00000000` | YES      |
| Lodestar   | `7f11a203`  | `00000000` | YES      |
| Grandine   | `7f11a203`  | `00000000` | YES      |

Proposed Changes

  • Change the ForkContext::next_fork_digest() to return [u8; 4] (returning [0u8; 4] for "no next fork").
  • Update the initialization path and runtime fork transition path accordingly.

Added tests:

  • test_next_fork_digest — existing test passes with non-Option return type
  • test_next_fork_digest_returns_zero_when_no_next_fork — init at last BPO fork returns [0u8; 4]
  • test_next_fork_digest_zero_after_runtime_transition_to_last_fork — simulates update_current_fork to last fork, then verifies zero

Additional Info

The following tests/checks and passed:

cargo nextest run -p types test_next_fork_digest
cargo check -p network -p lighthouse_network
cargo nextest run -p network                                                                                           
cargo fmt --all && make lint-fix

Found this bug while differential testing, details in Prysm issue.

Thanks for your time!

@Alleysira Alleysira requested a review from jxs as a code owner April 15, 2026 13:19
@jimmygchen
Copy link
Copy Markdown
Member

Thanks for the PR!

Were you able to verify in kurtosis that this fixes persisted ENRs? I had a quick skim but I don't think we update nfd on startup, so it probably just uses what was persisted on disk.

@jimmygchen jimmygchen added bug Something isn't working waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Apr 15, 2026
@Alleysira
Copy link
Copy Markdown
Author

Alleysira commented Apr 16, 2026

Thanks for you insight!

You're right, the former fix only covers the init and runtime transition paths. Startup can still reuse a persisted enr.dat because compare_enr() doesn't compare nfd. I've added nfd to the comparison, so stale enr.dat gets rewritten with a bumped seq)

Were you able to verify in kurtosis that this fixes persisted ENRs?

I've built a custom image and rerun Kurtosis to confirm the fix. The test verified that the compare_enr() fix correctly rejects a stale persisted enr.dat containing a wrong nfd value after upgrading to the patched Lighthouse image. The test uses a two-image approach: start with the buggy stock image to persist a wrong nfd, then swap to the fixed image and confirm the ENR is corrected on restart.

Client Image Role
Lighthouse sigp/lighthouse:v8.1.3alleysira/lighthouse-nfd-fix:latest Test target
Teku ethpandaops/teku:master Reference
Nimbus statusim/nimbus-eth2:multiarch-v26.3.1 Reference

Test Procedure

Phase 1: Reproduce the bug with stock image

  1. Started enclave with sigp/lighthouse:v8.1.3 (buggy, no compare_enr nfd check)
  2. Waited for Gloas activation at epoch 1 (~6 minutes)
  3. Queried ENRs via /eth/v1/node/identity on all clients

Phase 2: Swap image and restart

  1. kurtosis service stop nfd-test2 cl-1-lighthouse-geth
  2. kurtosis service update nfd-test2 cl-1-lighthouse-geth --image alleysira/lighthouse-nfd-fix:latest
  3. Service auto-started with the fixed image, same datadir/volume (persisted enr.dat retained)
  4. Queried ENRs again

Results

Phase 1 — Stock sigp/lighthouse:v8.1.3 (post-Gloas, buggy):

Client fork_digest nfd seq Correct?
Lighthouse d0bc159b d0bc159b 9 NO
Teku d0bc159b 00000000 13 YES
Nimbus d0bc159b 00000000 38 YES

Lighthouse nfd equals its own fork_digest — the runtime transition path sets nfd to the current fork digest instead of clearing it when no next fork is scheduled.

Phase 2 — Swapped to alleysira/lighthouse-nfd-fix:latest (restarted, same datadir):

Client fork_digest nfd seq Correct?
Lighthouse d0bc159b 00000000 3 YES
Teku d0bc159b 00000000 13 YES
Nimbus d0bc159b 00000000 38 YES

After swapping to the fixed image and restarting with the same persisted datadir, Lighthouse nfd is now 00000000.

@eserilev
Copy link
Copy Markdown
Member

@Alleysira please dont respond with an LLM. it's not helpful to read your comment when its an obviously AI generated wall of text.

Can you please summarize what you wrote above in your own words

@Alleysira
Copy link
Copy Markdown
Author

@Alleysira please dont respond with an LLM. it's not helpful to read your comment when its an obviously AI generated wall of text.

Hi @eserilev, sorry about that, I used an LLM to format the test results, thought would make the result easier to read 😭

Can you please summarize what you wrote above in your own words

Basically, as @jimmygchen said, Lighthouse would reuse persisted enr on startup. To solve this, I added nfd check in compare_enr().

To verify the fix works, I ran a kurtosis testnet first with the old image, waited for next fork, and confirmed that nfd was wrong. Then I swapped the CL to the fixed image, keeping the same datadir and restarted. After restart, nfd was updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working waiting-on-author The reviewer has suggested changes and awaits thier implementation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants