Skip to content

feat(validator): add VC DB metrics and slashing protection timing to dashboard#9206

Open
TechFusionData wants to merge 5 commits intoChainSafe:unstablefrom
TechFusionData:feat/validator-dashboard-db-metrics
Open

feat(validator): add VC DB metrics and slashing protection timing to dashboard#9206
TechFusionData wants to merge 5 commits intoChainSafe:unstablefrom
TechFusionData:feat/validator-dashboard-db-metrics

Conversation

@TechFusionData
Copy link
Copy Markdown

@TechFusionData TechFusionData commented Apr 10, 2026

Motivation: Resolves remaining items in #5663. The validator client exposes DB metrics (vc_db_read_req_total, vc_db_write_req_total, vc_db_read_items_total, vc_db_write_items_total, vc_db_size_bytes_total, vc_db_approximate_size_time_seconds) but these weren't visualised in the VC dashboard — only in lodestar_vm_host.json. Slashing protection check latency also had no visibility.

Changes:

Dashboard (dashboards/lodestar_validator_client.json):

  • New "DB Metrics" row with 6 panels (IDs 52–57): read/write req rates by bucket (log scale), read/write item rates by bucket (log scale), DB size in bytes, approximate size duration
  • New "Slashing Protection Timing" row with 2 panels (IDs 59–60): average block check and attestation check duration
  • All rate panels use the existing $rate_interval template variable

Source (packages/validator/src/):

  • metrics.ts: Two new histograms — vc_slashing_protection_block_check_seconds and vc_slashing_protection_attestation_check_seconds with buckets [0.0001, 0.001, 0.01, 0.1]
  • services/validatorStore.ts: Wraps checkAndInsertBlockProposal and checkAndInsertAttestation with startTimer() / finally for the new histograms

Why slashing protection timing matters: These checks run on every block proposal and attestation. If the slashing protection DB grows large or encounters lock contention, check latency can silently increase and eat into the validator's signing window. Having it on the dashboard lets operators spot degradation before it causes missed duties.

Checklist:

  • JSON is valid, panel IDs are unique (51–60)
  • Follows existing panel style conventions from lodestar_vm_host.json
  • New metrics use standard Prometheus histogram patterns
  • Timing instrumentation uses try/finally to record on error paths
  • AI tools were used to assist with this PR

Add 6 new panels to the validator client Grafana dashboard for
visualizing DB metrics that were previously only available in the
VM host dashboard:

- VC DB read requests / sec (by bucket, log scale)
- VC DB write requests / sec (by bucket, log scale)
- VC DB read items / sec (by bucket, log scale)
- VC DB write items / sec (by bucket, log scale)
- VC DB size (bytes)
- VC DB approximate size duration (seconds, avg)

Metrics consumed: vc_db_read_req_total, vc_db_write_req_total,
vc_db_read_items_total, vc_db_write_items_total, vc_db_size_bytes_total,
vc_db_approximate_size_time_seconds

Partially resolves ChainSafe#5663
@TechFusionData TechFusionData requested a review from a team as a code owner April 10, 2026 22:06
@TechFusionData
Copy link
Copy Markdown
Author

AI Disclosure: This PR was developed with AI assistance (Claude) for code exploration, panel structure, and JSON authoring, per Lodestar contribution policy.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new 'DB Metrics' row to the Lodestar Validator Client dashboard, featuring panels for tracking database read/write request rates, item processing rates, total database size, and approximate size calculation durations. The review feedback suggests replacing the built-in $__rate_interval variable with the dashboard's custom $rate_interval variable across several new panels to maintain consistency and ensure proper integration with dashboard controls.

Comment thread dashboards/lodestar_validator_client.json Outdated
Comment thread dashboards/lodestar_validator_client.json Outdated
Comment thread dashboards/lodestar_validator_client.json Outdated
Comment thread dashboards/lodestar_validator_client.json Outdated
Comment thread dashboards/lodestar_validator_client.json Outdated
… panels

Add histogram metrics to measure checkAndInsertBlockProposal and
checkAndInsertAttestation execution time:

- metrics.ts: add vc_slashing_protection_block_check_seconds and
  vc_slashing_protection_attestation_check_seconds histograms with
  buckets [0.0001, 0.001, 0.01, 0.1]s
- validatorStore.ts: instrument both slashing protection calls with
  startTimer() using try/finally so timing is recorded on error paths too
- lodestar_validator_client.json: add 'Slashing Protection Timing'
  dashboard section (row + 2 avg-duration panels, IDs 58-60)

Part of ChainSafe#5663 (validator dashboard improvements)

AI-assisted implementation
@TechFusionData
Copy link
Copy Markdown
Author

Added a second commit with Part 2 — slashing protection timing metrics:

New Prometheus metrics (packages/validator/src/metrics.ts):

  • vc_slashing_protection_block_check_seconds — histogram of checkAndInsertBlockProposal execution time
  • vc_slashing_protection_attestation_check_seconds — histogram of checkAndInsertAttestation execution time
  • Buckets: [0.1ms, 1ms, 10ms, 100ms] — aligned with expected DB read latency

Instrumentation (packages/validator/src/services/validatorStore.ts):

  • Both calls wrapped with startTimer() / finally { timer?.() } so timing is always recorded, including on the error path

Dashboard (dashboards/lodestar_validator_client.json):

  • New "Slashing Protection Timing" row section with two avg-duration panels (block check + attestation check, IDs 58–60)

AI-assisted implementation

@TechFusionData
Copy link
Copy Markdown
Author

Added a second commit with Part 2 — slashing protection timing metrics:

New Prometheus metrics (packages/validator/src/metrics.ts):

  • vc_slashing_protection_block_check_seconds — histogram of checkAndInsertBlockProposal execution time
  • vc_slashing_protection_attestation_check_seconds — histogram of checkAndInsertAttestation execution time
  • Buckets: [0.1ms, 1ms, 10ms, 100ms]

Instrumentation (packages/validator/src/services/validatorStore.ts):

  • Both calls wrapped with startTimer() / finally { timer?.() } so timing is always recorded, including on the error path

Dashboard (dashboards/lodestar_validator_client.json):

  • New Slashing Protection Timing row section with two avg-duration panels (IDs 58-60)

AI-assisted implementation

…_interval

The VC dashboard defines a custom template variable $rate_interval for
controlling rate window via dropdown. Using Grafana's built-in
$__rate_interval bypasses this control. Replace all 7 occurrences across
the DB metrics and slashing protection panels we added (IDs 52-60).

Addresses bot review feedback on ChainSafe#9206.
@TechFusionData
Copy link
Copy Markdown
Author

Addressed bot review feedback: replaced $__rate_interval with the dashboard custom variable $rate_interval across all 7 new panel expressions (IDs 52–60). The built-in Grafana variable bypasses the dashboard dropdown, whereas $rate_interval is the existing convention throughout this dashboard.

Copy link
Copy Markdown
Contributor

@nazarhussain nazarhussain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The PR title says DB metrics only, but the branch now also adds:

    • 2 new validator metrics
    • 2 new slashing-protection dashboard panels
    • runtime instrumentation in validatorStore
  • The checklist/body still contains “No source code changes (dashboard JSON only)”, which is no longer true

Please update the PR title/description and particularly add details why these metrics are useful.

@TechFusionData TechFusionData changed the title feat(validator-dashboard): add VC DB metrics panels feat(validator): add VC DB metrics and slashing protection timing to dashboard Apr 13, 2026
@TechFusionData
Copy link
Copy Markdown
Author

Good catch — updated the title, description, and checklist to cover the full scope (DB metrics panels + slashing protection timing metrics and instrumentation). Added rationale for the timing metrics.

nazarhussain
nazarhussain previously approved these changes Apr 14, 2026
@nazarhussain
Copy link
Copy Markdown
Contributor

@TechFusionData PR looks good. Please fix the linting issue with the generated dashboard json, so we can merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants