Skip to content

Better storage for historical blobs #9214

@michaelsproul

Description

@michaelsproul

Description

At the moment historical blobs are stored in one big DB (the blobs DB), which can reach several terabytes if running with archive-config (--complete-blob-backfill and --supernode).

The metrics from this PR (#8404) show that the writes to the blob DB in this case can take 16s+ on decent hardware, which prevents a node that has fallen behind from ever catching up (16s > 12s slot time).

Part of the problem is that the database is indexed by block root, so each write potentially moves around large parts of the tree. This is really dumb conceptually, we would be much better off with a slot-indexed tree for finalized blobs, OR flat files.

Steps to resolve

Devise a new schema that is capable of storing TB of data without spiking the block import time. Some ideas:

  • Split hot blobs/columns vs finalized blobs. Perhaps hot blobs/columns could live in the hot DB (indexed by block root) while the blob DB directory could be repurposed for finalized blobs only (indexed by slot). If lookup by block root is needed we can use the DBColumn::BeaconBlock -> Slot mapping in the freezer DB. The block import path should no longer be blocked by the big blob DB, because the number of unfinalized hot blobs should be quite small and won't blow out the write time for the hot DB too much.
  • Optionally: for the finalized blobs/columns we could either use flat files or a LevelDB database. I think a slot-indexed database would likely be sufficient, but if we're in the business of making a change, maybe a flat file would be even better. Maybe even an era blob format? I think @dapplion played around with some flat file and era file stuff and may have good insights.

Metadata

Metadata

Assignees

No one assigned

    Labels

    databaseoptimizationSomething to make Lighthouse run more efficiently.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions