Skip to content

switch ssv-node storage to Pebble DB#2709

Draft
nkryuchkov wants to merge 17 commits intostagefrom
pebbledb-all
Draft

switch ssv-node storage to Pebble DB#2709
nkryuchkov wants to merge 17 commits intostagefrom
pebbledb-all

Conversation

@nkryuchkov
Copy link
Contributor

The PR applies Pebble DB usage for both ssv-node and exporter instead of Badger DB for ssv-node and Pebble DB for exporter.

@nkryuchkov
Copy link
Contributor Author

@greptileai please review this PR

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 2, 2026

Greptile Summary

This PR completes the transition from BadgerDB to PebbleDB for all SSV node operators (not just exporter), removing the entire Badger storage implementation from the main module and replacing it with a unified Pebble backend.

Key changes:

  • storage/migration/db_migration.go (new): ResolvePebbleDBPlan determines which Pebble path to use and whether a Badger→Pebble import is needed, with a fast path that skips BadgerDB opens on post-migration startups. MigrateBadgerToPebbleIfNeeded performs a resumable, context-aware key-by-key copy using marker files (.ssv-badger-import.inprogress.json / .ssv-badger-import.done.json).
  • storage/pebble/pebble.go: Adds defer batch.Close() to Update and SetMany to prevent batch resource leaks, adds cleanupPath for temp-directory cleanup, and null-guards pdb.DB in Close().
  • storage/pebble/state.go (new): DirState probes a Pebble directory without the full storage/pebble.DB wrapper, used by ResolvePebbleDBPlan.
  • storage/pebble/testutils.go (new): NewTemporary replaces the removed kv.NewInMemory across the entire test suite; 40+ test files updated to use pebble.NewTemporary with t.Cleanup.
  • storage/basedb/storage.go: Removes Badger-specific GCInterval, Reporting fields (operators should note these env vars are now silently ignored — raised in a prior thread).
  • All Badger storage files deleted from the main module; the ssvsigner sub-module similarly drops Badger and gains the CockroachDB pebble dependency.

Confidence Score: 4/5

  • Safe to merge; migration is resumable and context-aware, with a fast path that avoids reopening Badger on every post-migration startup.
  • The migration logic is well-designed with marker files, context cancellation, and disk-space error handling. Test coverage is comprehensive. Minor deductions for the hasBadgerData/badgerNonEmpty semantic conflation in the exported MigrateBadgerToPebbleIfNeeded API, the unchecked defer close in one test, and the silent removal of GCInterval/Reporting env vars without operator-facing warnings (addressed in a prior review thread).
  • Pay attention to storage/migration/db_migration.go — specifically the badgerStateKnown=true, badgerNonEmpty=false code path in migrateBadgerToPebbleIfNeeded and the semantics of the exported API.

Important Files Changed

Filename Overview
storage/migration/db_migration.go New file: core Badger→Pebble migration logic with resumable import, marker-file state machine, context cancellation, and disk-space error wrapping. Logic is sound; minor semantic ambiguity between hasBadgerData and badgerNonEmpty when state is pre-known.
storage/migration/db_migration_test.go New file: comprehensive end-to-end tests for migration, including interruption/resume and context cancellation. Two tests are missing t.Parallel() (addressed in prior thread).
storage/pebble/pebble.go Adds cleanupPath for temp-dir cleanup, adds defer batch.Close() to Update and SetMany, and fixes null-check on pdb.DB in Close(). All changes are correct; the deferred close after commit is a minor style point.
storage/pebble/state.go New file: DirState probes whether a Pebble directory exists and is non-empty by opening the DB in read-only mode. Correctly checks iter.Error() after iter.First().
storage/pebble/testutils.go New file: NewTemporary creates a Pebble DB in a temp directory that is automatically removed on db.Close(). Clean implementation using cleanupPath.
cli/operator/node.go Removes Badger setup path; setupPebbleDB now calls ResolvePebbleDBPlan and MigrateBadgerToPebbleIfNeeded before opening Pebble. Properly closes DB on setup errors via closeOnSetupError.
network/p2p/testutils.go Switches from Badger in-memory to Pebble temp DB; adds p2pNetworkWithDB wrapper with onceCloser for DB lifecycle, and registers t.Cleanup for guaranteed cleanup. Signature of CreateAndStartLocalNet gains testing.TB parameter.
storage/basedb/storage.go Removes Badger-specific GCInterval, Reporting, and Engine fields from Options, leaving only Ctx and Path. Breaking change for operators relying on those env vars; a warning at startup for unrecognised env vars would help (noted in prior thread).
ssvsigner/ekm/slashing_protector_test.go Migrates to pebble.NewTemporary and getBaseStorage(t, logger). TestSlashingDBIntegrity correctly checks Phase 1 close error but uses unchecked defer db2.Close() in Phase 2.
storage/badger/badger.go Deleted: entire Badger storage implementation removed from the main module.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Node Startup] --> B[ResolvePebbleDBPlan basePath]
    B --> C{Canonical Pebble\nnon-empty AND\nno legacy Pebble?}
    C -->|Yes| D{Import marker\nexists at basePath?}
    D -->|Yes - fast path| E[Plan: PebblePath=basePath\nBadgerImportPath=basePath]
    C -->|No| F{Legacy Pebble\nnon-empty AND\nno canonical Pebble?}
    F -->|Yes| G{Import marker\nexists at legacyPath?}
    G -->|Yes - fast path| H[Plan: PebblePath=legacyPebblePath\nBadgerImportPath=basePath]
    F -->|No| I[Call badgerDirState]
    D -->|No| I
    G -->|No| I
    I --> J{Multiple non-empty\nDBs detected?}
    J -->|Yes + marker| K[Resolve with marker]
    J -->|Yes no marker| L[Fatal: ambiguous state]
    J -->|No| M{Which DB\nhas data?}
    M -->|canonical Pebble| N[Plan: PebblePath=basePath]
    M -->|legacy Pebble| O[Plan: PebblePath=legacyPebblePath]
    M -->|Badger only| P[Plan: PebblePath=legacyPebblePath\nBadgerImportPath=basePath]
    M -->|None| Q[Plan: PebblePath=basePath fresh]

    E --> R[Open Pebble DB]
    H --> R
    N --> R
    O --> R
    P --> R
    Q --> R
    K --> R

    R --> S{BadgerImportPath set?}
    S -->|No| X[Apply SSV Migrations]
    S -->|Yes| T[MigrateBadgerToPebbleIfNeeded]
    T --> U{Done marker\nexists?}
    U -->|Yes + Pebble not empty| X
    U -->|No| V{Pebble empty\nor in-progress marker?}
    V -->|Resume| W[copyBadgerToPebble\nwith context checks\nevery key]
    V -->|Both non-empty\nno markers| L2[Fatal: partial migration]
    W --> Y[Write done marker\nRemove in-progress marker]
    Y --> X
    X --> Z[Node Running]
Loading

Comments Outside Diff (1)

  1. storage/pebble/pebble.go, line 165-173 (link)

    defer batch.Close() after successful Commit is a double-close pattern

    With the addition of defer func() { _ = batch.Close() }(), the successful path of Update now calls Commit and then the deferred Close. In CockroachDB pebble, closing an already-committed batch is safe (the close simply returns the batch to the pool), but this is a subtle pattern that differs from Badger's transaction model.

    The same pattern is applied to SetMany (line 177). Both are functionally correct, but the pattern of closing an already-committed batch may generate confusion for future contributors. Consider adding a comment:

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Last reviewed commit: d5e70ad

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

62 files reviewed, 15 comments

Edit Code Review Agent Settings | Greptile

@codecov
Copy link

codecov bot commented Mar 2, 2026

Codecov Report

❌ Patch coverage is 51.80000% with 241 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.6%. Comparing base (20ffb01) to head (d5e70ad).
⚠️ Report is 1 commits behind head on stage.

Files with missing lines Patch % Lines
storage/migration/db_migration.go 57.9% 105 Missing and 54 partials ⚠️
cli/operator/node.go 0.0% 41 Missing ⚠️
storage/pebble/state.go 0.0% 26 Missing ⚠️
network/p2p/testutils.go 72.9% 9 Missing and 1 partial ⚠️
storage/pebble/testutils.go 50.0% 3 Missing and 2 partials ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@nkryuchkov
Copy link
Contributor Author

@greptileai please review it again

@nkryuchkov
Copy link
Contributor Author

@greptileai review it again

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 3, 2026

Additional Comments (2)

storage/pebble/pebble.go
Batch leaked on fn error

If fn(txn) returns an error, the IndexedBatch created on line 166 is never committed or closed, leaking its memory. Pebble batches must always be explicitly closed after use. Add a defer to ensure cleanup:

func (pdb *DB) Update(fn func(basedb.Txn) error) error {
	batch := pdb.NewIndexedBatch()
	defer func() { _ = batch.Close() }()
	txn := newTxn(pdb.logger, batch)
	if err := fn(txn); err != nil {
		return err
	}
	return batch.Commit(pebble.Sync)
}

storage/pebble/pebble.go
Batch leaked on SetMany error

If txn.SetMany returns an error, the Batch created on line 175 is neither committed nor closed. Same resource leak as Update. Fix:

func (pdb *DB) SetMany(prefix []byte, n int, next func(int) (basedb.Obj, error)) error {
	batch := pdb.NewBatch()
	defer func() { _ = batch.Close() }()
	txn := newTxn(pdb.logger, batch)
	if err := txn.SetMany(prefix, n, next); err != nil {
		return err
	}
	return batch.Commit(pebble.Sync)
}

@nkryuchkov
Copy link
Contributor Author

@greptileai review this again, considering all comments on this PR

@nkryuchkov
Copy link
Contributor Author

@greptileai review this again, considering all comments on this PR

@nkryuchkov
Copy link
Contributor Author

@greptileai review this again, considering all comments on this PR

@nkryuchkov
Copy link
Contributor Author

@greptileai review this again, considering all comments on this PR

@nkryuchkov
Copy link
Contributor Author

@greptileai review this again, considering all comments on this PR

@nkryuchkov
Copy link
Contributor Author

@greptileai review this again, considering all comments on this PR

@nkryuchkov
Copy link
Contributor Author

@greptileai review this again, considering all comments on this PR

@nkryuchkov
Copy link
Contributor Author

@greptileai review this again, considering all comments on this PR. Don't limit the number of comments to 3 — post all the comments you have

@nkryuchkov
Copy link
Contributor Author

@greptileai review this again, considering all comments on this PR

@nkryuchkov
Copy link
Contributor Author

@greptileai review this again, considering all comments on this PR

Comment on lines +220 to +226
hasBadgerData, badgerNonEmpty, err = badgerDirState(badgerPath)
if err != nil {
return false, 0, err
}
}

if !pebbleEmpty {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasBadgerData and badgerNonEmpty conflated to same boolean when state is known

When badgerStateKnown=true, both hasBadgerData (semantically "Badger directory has files") and badgerNonEmpty (semantically "Badger DB has at least one key") are assigned the same value knownBadgerNonEmpty. This conflates two distinct states: a Badger directory can exist and be empty (hasBadgerData=true, badgerNonEmpty=false), but with badgerStateKnown=true and knownBadgerNonEmpty=false, both become false.

As a result, when badgerStateKnown=true and the Badger DB is empty (but files exist), the function exits via if !hasBadgerData { return false, 0, nil } silently, instead of via the if !badgerNonEmpty { logger.Info("legacy badger database is empty...") } path. The "skipping import because empty" log line is never emitted in this case.

While this is currently harmless because ResolvePebbleDBPlan only sets BadgerImportPath when badgerNonEmpty=true, MigrateBadgerToPebbleIfNeeded is an exported function and callers providing badgerStateKnown=true, badgerNonEmpty=false when Badger files genuinely exist will silently skip migration without any diagnostic log.

Consider using two distinct fields in PebbleDBPlan or a dedicated PebbleDBPlan.BadgerExists bool to preserve the semantic distinction, or document that BadgerNonEmpty=false implies "no Badger files or empty Badger".

db2, err := kv.New(logger, basedb.Options{Path: dbPath})
db2, err := pebble.New(logger, dbPath, nil)
require.NoError(t, err)
defer db2.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer db2.Close() error unchecked in Phase 2

defer db2.Close() discards the close error. If the pebble DB fails to flush or sync on close, the test will pass silently despite a DB error. For consistency with how db.Close() is handled in Phase 1 (line 945–946 with require.NoError), prefer using t.Cleanup:

Suggested change
defer db2.Close()
t.Cleanup(func() { require.NoError(t, db2.Close()) })

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant