(WIP) feat: backfill sync v2#9238
Conversation
…ckfillStateRepository
There was a problem hiding this comment.
Code Review
This pull request introduces a new backfill synchronization implementation (BackfillSyncV2) and updates the database schema to track backfill progress using epochs. Key changes include the addition of BackfillStateRepository and the BackfilledRange singleton, while the legacy BackfillSync and BackfilledRanges are being deprecated or disabled. Feedback highlights a breaking database schema change due to bucket repurposing, the presence of non-functional legacy code, and performance concerns regarding block-by-block network fetching and sequential database lookups in the new implementation. Additionally, starting the sync process within the constructor is noted as a testing and control issue.
| // index_lightClientInitProof = 36, // DEPRECATED on v0.32.0 | ||
|
|
||
| backfilled_ranges = 42, // Backfilled From to To, inclusive of both From, To | ||
| backfill_state = 42, // Epoch -> EpochBackfillState |
There was a problem hiding this comment.
Repurposing bucket ID 42 from backfilled_ranges (which stored Slot -> Slot) to backfill_state (which stores Epoch -> BackfillStateWrapper) is a breaking change for the database schema. Existing databases will contain data in the old format, which will cause deserialization errors or logic bugs when accessed by the new code. A database migration should be provided, or a new bucket ID should be allocated for the new backfill state.
| const backfillRangeWrittenSlot = null as number | null; | ||
| const previousBackfilledRanges = [] as {key: number; value: number}[]; |
There was a problem hiding this comment.
The legacy BackfillSync implementation is being broken here by hardcoding backfillRangeWrittenSlot to null and previousBackfilledRanges to an empty array. If this class is intended to be deprecated and replaced by BackfillSyncV2, it should be formally deprecated or removed. Leaving it in a non-functional state with TODO comments can lead to confusion and bugs if it's still reachable in certain configurations.
| // try { | ||
| // const finalizedBlockFC = chain.forkChoice.getBlockHexDefaultStatus(finalized.rootHex); | ||
| // if (finalizedBlockFC && finalizedBlockFC.slot > chain.anchorStateLatestBlockSlot) { | ||
| // await db.backfilledRanges.put(finalizedBlockFC.slot, chain.anchorStateLatestBlockSlot); | ||
| // | ||
| // const filteredSeqs = await db.backfilledRanges.entries({ | ||
| // gt: chain.anchorStateLatestBlockSlot, | ||
| // lt: finalizedBlockFC.slot, | ||
| // }); | ||
| // logger.debug("updated backfilledRanges", { | ||
| // key: finalizedBlockFC.slot, | ||
| // value: chain.anchorStateLatestBlockSlot, | ||
| // }); | ||
| // if (filteredSeqs.length > 0) { | ||
| // await db.backfilledRanges.batchDelete(filteredSeqs.map((entry) => entry.key)); | ||
| // logger.debug( | ||
| // `Forward Sync - cleaned up backfilledRanges between ${finalizedBlockFC.slot},${chain.anchorStateLatestBlockSlot}`, | ||
| // {seqs: JSON.stringify(filteredSeqs)} | ||
| // ); | ||
| // } | ||
| // } | ||
| // } catch (e) { | ||
| // logger.error("Error updating backfilledRanges on finalization", {epoch: finalized.epoch}, e as Error); | ||
| // } |
There was a problem hiding this comment.
The logic for updating the backfill range during finalization has been commented out. This effectively disables progress tracking for backfill sync during normal node operation. If this logic is being replaced by the new backfillState and backfilledRange singleton, it should be updated to use them instead of being left as commented-out code.
| this.sync() | ||
| .then((oldestSlotSynced) => { | ||
| if (this.status !== BackfillSyncStatus.completed) { | ||
| throw new ErrorAborted(`Invalid BackfillSyncStatus at completion: status = ${this.status}`); | ||
| } | ||
| this.emit(BackfillSyncEvent.completed, oldestSlotSynced); | ||
| this.logger.info("BackfillSync completed", {oldestSlotSynced}); | ||
| this.close(); | ||
| }) | ||
| .catch((e) => { | ||
| if (!(e instanceof ErrorAborted)) { | ||
| this.logger.error("BackfillSync processor error", e); | ||
| } | ||
| this.status = BackfillSyncStatus.aborted; | ||
| this.close(); | ||
| }); |
There was a problem hiding this comment.
Starting a long-running asynchronous process like sync() directly in the constructor is generally considered a bad practice. It makes the object difficult to test, as the sync process starts immediately upon instantiation, and it doesn't allow the caller to control when the synchronization begins. Consider moving the call to this.sync() to an explicit start() method.
| } | ||
| try { | ||
| // how could they fetch a block via root? | ||
| const [block] = await this.network.sendBeaconBlocksByRoot(peer, [this.anchorRoot]); |
There was a problem hiding this comment.
Fetching blocks one by one using sendBeaconBlocksByRoot is highly inefficient due to network round-trip times (RTT). Even if backfill sync is considered low priority, fetching a single block per request will make the process extremely slow for a long chain. Consider fetching blocks in batches using sendBeaconBlocksByRange or requesting multiple roots at once if the protocol supports it.
| while (epoch > 0) { | ||
| const state = await this.db.backfillState.get(epoch); | ||
| if (!state?.hasBlock) break; | ||
| epoch--; |
There was a problem hiding this comment.
This while loop performs sequential database lookups for every epoch. If a node has already backfilled a large number of epochs, this loop could perform thousands of asynchronous DB calls, potentially blocking the event loop and causing performance issues. It would be more efficient to use a reverse stream or iterator (e.g., this.db.backfillState.keysStream({lt: startEpoch, reverse: true})) to find the first gap in the backfilled epochs.
- Introduced comprehensive tests for the backfillV2 synchronization process, covering scenarios such as chain walking, flushing blocks at epoch boundaries, and handling skipped slots.
- Added a check for the aborted signal in the BackfillSync class to ensure proper termination of the sync process when the signal is triggered.
I found the peer score should be not that harsh because some behaviors are legitimate according to the spec.
Motivation
reconstructing backfill sync feature in lodestar.
Description
Block
DB schema related changes
BackfillStateRepository: per-epoch state with fieldshasBlockhasBlobcolumnIndicesBackfilledRangesfrom repository -> singleton since it's always a single value, never a keyed collection.backfillV2.tsparentRoot, one block per request. (This might seems not efficient but IMO backfill sync has the low priority so making the thread light is important.)slots at boundaries correctly)
anchorRoot === ZERO_HASH(i.e. we've walked past the genesisblock's parent) (TODO: changing to
MIN_EPOCHS_FOR_BLOCK_REQUESTSas default for block,MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTSfor blobs)db.backfilledRangeDB schema refactor:
BackfillStateRepository(backfill_statebucket): per-epoch state(
hasBlock,hasBlobs,columnIndices) so future work can fill blobs/columns independently of blocks.
backfilledRangemoved fromRepository→single/(singleton pattern)since it's always a single value, never a keyed collection.
mockedBeaconDbfor the new/changed entries.Blobs
tests
Unit tests in
test/unit/sync/backfill/backfillV2.test.tscover:AI Assistance Disclosure
TODOs