feat(validator): add VC DB metrics and slashing protection timing to dashboard#9206
feat(validator): add VC DB metrics and slashing protection timing to dashboard#9206TechFusionData wants to merge 5 commits intoChainSafe:unstablefrom
Conversation
Add 6 new panels to the validator client Grafana dashboard for visualizing DB metrics that were previously only available in the VM host dashboard: - VC DB read requests / sec (by bucket, log scale) - VC DB write requests / sec (by bucket, log scale) - VC DB read items / sec (by bucket, log scale) - VC DB write items / sec (by bucket, log scale) - VC DB size (bytes) - VC DB approximate size duration (seconds, avg) Metrics consumed: vc_db_read_req_total, vc_db_write_req_total, vc_db_read_items_total, vc_db_write_items_total, vc_db_size_bytes_total, vc_db_approximate_size_time_seconds Partially resolves ChainSafe#5663
|
AI Disclosure: This PR was developed with AI assistance (Claude) for code exploration, panel structure, and JSON authoring, per Lodestar contribution policy. |
There was a problem hiding this comment.
Code Review
This pull request adds a new 'DB Metrics' row to the Lodestar Validator Client dashboard, featuring panels for tracking database read/write request rates, item processing rates, total database size, and approximate size calculation durations. The review feedback suggests replacing the built-in $__rate_interval variable with the dashboard's custom $rate_interval variable across several new panels to maintain consistency and ensure proper integration with dashboard controls.
… panels Add histogram metrics to measure checkAndInsertBlockProposal and checkAndInsertAttestation execution time: - metrics.ts: add vc_slashing_protection_block_check_seconds and vc_slashing_protection_attestation_check_seconds histograms with buckets [0.0001, 0.001, 0.01, 0.1]s - validatorStore.ts: instrument both slashing protection calls with startTimer() using try/finally so timing is recorded on error paths too - lodestar_validator_client.json: add 'Slashing Protection Timing' dashboard section (row + 2 avg-duration panels, IDs 58-60) Part of ChainSafe#5663 (validator dashboard improvements) AI-assisted implementation
|
Added a second commit with Part 2 — slashing protection timing metrics: New Prometheus metrics (
Instrumentation (
Dashboard (
AI-assisted implementation |
|
Added a second commit with Part 2 — slashing protection timing metrics: New Prometheus metrics (
Instrumentation (
Dashboard (
AI-assisted implementation |
…_interval The VC dashboard defines a custom template variable $rate_interval for controlling rate window via dropdown. Using Grafana's built-in $__rate_interval bypasses this control. Replace all 7 occurrences across the DB metrics and slashing protection panels we added (IDs 52-60). Addresses bot review feedback on ChainSafe#9206.
|
Addressed bot review feedback: replaced |
nazarhussain
left a comment
There was a problem hiding this comment.
-
The PR title says DB metrics only, but the branch now also adds:
- 2 new validator metrics
- 2 new slashing-protection dashboard panels
- runtime instrumentation in validatorStore
-
The checklist/body still contains “No source code changes (dashboard JSON only)”, which is no longer true
Please update the PR title/description and particularly add details why these metrics are useful.
|
Good catch — updated the title, description, and checklist to cover the full scope (DB metrics panels + slashing protection timing metrics and instrumentation). Added rationale for the timing metrics. |
|
@TechFusionData PR looks good. Please fix the linting issue with the generated dashboard json, so we can merge it. |
Motivation: Resolves remaining items in #5663. The validator client exposes DB metrics (
vc_db_read_req_total,vc_db_write_req_total,vc_db_read_items_total,vc_db_write_items_total,vc_db_size_bytes_total,vc_db_approximate_size_time_seconds) but these weren't visualised in the VC dashboard — only inlodestar_vm_host.json. Slashing protection check latency also had no visibility.Changes:
Dashboard (
dashboards/lodestar_validator_client.json):$rate_intervaltemplate variableSource (
packages/validator/src/):metrics.ts: Two new histograms —vc_slashing_protection_block_check_secondsandvc_slashing_protection_attestation_check_secondswith buckets[0.0001, 0.001, 0.01, 0.1]services/validatorStore.ts: WrapscheckAndInsertBlockProposalandcheckAndInsertAttestationwithstartTimer()/finallyfor the new histogramsWhy slashing protection timing matters: These checks run on every block proposal and attestation. If the slashing protection DB grows large or encounters lock contention, check latency can silently increase and eat into the validator's signing window. Having it on the dashboard lets operators spot degradation before it causes missed duties.
Checklist:
lodestar_vm_host.json