fix: improve performance, upgraded otel fastify#902
Conversation
📝 WalkthroughSummary by CodeRabbitRelease Notes
WalkthroughReworks tracing and metrics integration: replaces Sequence Diagram(s)mermaid Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
93c10aa to
e49ff7a
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/internal/monitoring/metrics.ts`:
- Around line 222-248: registerObservableGauge currently creates observable
instruments but never checks the entry.enabled flag when emitting values, so
dbActivePool, dbActiveConnection, and dbInUseConnection are always observed;
modify registerObservableGauge to register the observable with a callback that
first checks the created entry.enabled (or disabledMetrics) and only records
values when enabled, leaving metricsRegistry.set(name, entry) intact and
ensuring the same entry object (used by setMetricsEnabled) controls observation;
reference registerObservableGauge, the returned entry,
db_active_local_pools/db_connections/db_connections_in_use,
dbActivePool/dbActiveConnection/dbInUseConnection, metricsRegistry, and
disabledMetrics when implementing the check.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: e58a638f-ce06-4c65-846e-9c13c11016f0
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (7)
package.jsonsrc/http/plugins/tracing.tssrc/internal/database/pool.tssrc/internal/monitoring/logger.tssrc/internal/monitoring/metrics.tssrc/internal/monitoring/otel-metrics.tssrc/internal/monitoring/otel-tracing.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/http/plugins/tracing.ts
e49ff7a to
3bbd95c
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
monitoring/grafana/dashboards/storage-otel.json (1)
1007-1014:⚠️ Potential issue | 🟠 MajorThis dashboard will import with blank data.
Line 5743 and Line 5744 pin the default time range to a 55-second window on March 10, 2026, so a fresh import opens on stale/no data. Several new PromQL selectors also no longer match the current metrics surface:
status_code=~"2xx"/"4xx"/"5xx"conflicts with the numeric status selectors already used elsewhere in this file, andstorage_api_otel_db_poolno longer matches the renameddb_active_local_poolsinstrument insrc/internal/monitoring/metrics.ts. Those panels will stay empty until the queries and default time range are updated.Also applies to: 1194-1229, 1436-1445, 1827-1958, 2929-2937, 5742-5744
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@monitoring/grafana/dashboards/storage-otel.json` around lines 1007 - 1014, The dashboard panels import empty because a fixed dashboard time range is pinned to a 55s window (remove or replace the dashboard "time" object so Grafana uses a relative or default range like "now-6h"/"now" or $__from/$__to) and several PromQL queries reference a renamed metric and incompatible status label selectors; update any panel "expr" that uses storage_api_otel_db_pool to use the renamed metric db_active_local_pools, and replace status_code=~"2xx"/"4xx"/"5xx" with the numeric-style selectors used elsewhere (e.g., a regex like status_code=~"2[0-9][0-9]" / "4[0-9][0-9]" / "5[0-9][0-9]" or the label name used across this file), and apply these changes to all affected panels (those using the "expr" field shown and the other ranges noted) so panels show current data after import.
♻️ Duplicate comments (2)
src/http/plugins/tracing.ts (1)
31-38:⚠️ Potential issue | 🟠 MajorSet these on the request span, not the active hook span.
@fastify/otelexposes the per-request span onreq.opentelemetry().spanand documentsrequestHookas the callback that runs immediately after creating that root request span; lifecycle hooks get separate hook spans. Usingtrace.getActiveSpan()insideonRequestis therefore the wrong handle for request-level attrs here, sotenant.ref/trace.modecan land on the hook span instead of the request span. Use the request span handle here, or move the synchronous part intorequestHook. (github.com)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/http/plugins/tracing.ts` around lines 31 - 38, The attributes are being set on the active hook span via trace.getActiveSpan() instead of the request root span; change the code in the onRequest handler to retrieve the per-request span from request.opentelemetry().span (or move this logic into the requestHook) and call setAttribute('tenant.ref', request.tenantId) and setAttribute('trace.mode', request.tracingMode) on that request span (referencing the existing span variable usage and trace.getActiveSpan() in the current diff) so tenant.ref / trace.mode land on the request span rather than the hook span.src/internal/monitoring/metrics.ts (1)
55-75:⚠️ Potential issue | 🟠 MajorMetric disable controls are now effectively dead.
registerMetric()recordsenabled, but nothing consults it before emitting. After this refactor,METRICS_DISABLEDandsetMetricsEnabled()stop suppressing the synchronous instruments here, and the observable pool gauges still have no path to honor the flag.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/internal/monitoring/metrics.ts` around lines 55 - 75, registerMetric currently records enabled in metricsRegistry but never uses it; update registerMetric to consult the enabled flag (derived from disabledMetrics / setMetricsEnabled / METRICS_DISABLED) and make synchronous instruments respect it by returning a no-op instrument or wrapping inst.record and inst.add to early-return when !enabled; keep existing tenant-attr stripping (prometheusMetricsIncludeTenantId & stripTenantAttrs) but ensure the wrapper checks metricsRegistry.get(name).enabled before calling original, and likewise ensure any observable pool gauges/observable callbacks check the same enabled flag before emitting/observing so disabled metrics are actually suppressed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@monitoring/grafana/dashboards/storage-otel.json`:
- Around line 1007-1014: The dashboard panels import empty because a fixed
dashboard time range is pinned to a 55s window (remove or replace the dashboard
"time" object so Grafana uses a relative or default range like "now-6h"/"now" or
$__from/$__to) and several PromQL queries reference a renamed metric and
incompatible status label selectors; update any panel "expr" that uses
storage_api_otel_db_pool to use the renamed metric db_active_local_pools, and
replace status_code=~"2xx"/"4xx"/"5xx" with the numeric-style selectors used
elsewhere (e.g., a regex like status_code=~"2[0-9][0-9]" / "4[0-9][0-9]" /
"5[0-9][0-9]" or the label name used across this file), and apply these changes
to all affected panels (those using the "expr" field shown and the other ranges
noted) so panels show current data after import.
---
Duplicate comments:
In `@src/http/plugins/tracing.ts`:
- Around line 31-38: The attributes are being set on the active hook span via
trace.getActiveSpan() instead of the request root span; change the code in the
onRequest handler to retrieve the per-request span from
request.opentelemetry().span (or move this logic into the requestHook) and call
setAttribute('tenant.ref', request.tenantId) and setAttribute('trace.mode',
request.tracingMode) on that request span (referencing the existing span
variable usage and trace.getActiveSpan() in the current diff) so tenant.ref /
trace.mode land on the request span rather than the hook span.
In `@src/internal/monitoring/metrics.ts`:
- Around line 55-75: registerMetric currently records enabled in metricsRegistry
but never uses it; update registerMetric to consult the enabled flag (derived
from disabledMetrics / setMetricsEnabled / METRICS_DISABLED) and make
synchronous instruments respect it by returning a no-op instrument or wrapping
inst.record and inst.add to early-return when !enabled; keep existing
tenant-attr stripping (prometheusMetricsIncludeTenantId & stripTenantAttrs) but
ensure the wrapper checks metricsRegistry.get(name).enabled before calling
original, and likewise ensure any observable pool gauges/observable callbacks
check the same enabled flag before emitting/observing so disabled metrics are
actually suppressed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 7ce40c1f-dd40-4cd6-8e26-77071e818aa5
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (10)
monitoring/grafana/dashboards/storage-otel.jsonpackage.jsonsrc/http/plugins/metrics.tssrc/http/plugins/tracing.tssrc/http/routes/tus/lifecycle.tssrc/internal/database/pool.tssrc/internal/monitoring/logger.tssrc/internal/monitoring/metrics.tssrc/internal/monitoring/otel-metrics.tssrc/internal/monitoring/otel-tracing.ts
💤 Files with no reviewable changes (2)
- src/http/plugins/metrics.ts
- src/http/routes/tus/lifecycle.ts
ad725b7 to
1d97551
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@monitoring/grafana/dashboards/storage-otel.json`:
- Around line 5593-5594: The dashboard JSON currently hard-codes a fixed time
window using the "from" and "to" properties (e.g., "from":
"2026-03-10T12:33:29.823Z", "to": "2026-03-10T12:34:24.902Z"); change these to
relative time strings (for example "from": "now-5m", "to": "now") so the
dashboard opens with a live range. Locate and update the "from" and "to" keys in
the dashboard root/timeRange object (and any other occurrences of fixed ISO
timestamps) to use relative expressions like now-5m/now to ensure fresh imports
open with current data.
- Around line 1165-1201: The panels are querying the wrong series name and using
literal "2xx"/"4xx"/"5xx" label values; replace occurrences of
storage_api_otel_http_request_duration_seconds_count with
storage_api_http_request_duration_seconds_count (the true series produced by
http_request_duration_seconds) and change status_code=~"2xx"/"4xx"/"5xx" to a
proper regex that matches numeric HTTP codes such as
status_code=~"2.."/"4.."/"5.." (or more strictly status_code=~"^2\\d{2}$", etc.)
wherever the expressions reference
storage_api_otel_http_request_duration_seconds_count and status_code label
filters so the panels (refs: the expressions with refId A/B/C and other similar
blocks) will return correct results.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 705c6ec1-e3ec-4d79-8833-823270043bbd
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (13)
monitoring/grafana/dashboards/storage-otel.jsonpackage.jsonsrc/http/plugins/log-request.tssrc/http/plugins/metrics.tssrc/http/plugins/tracing.tssrc/http/routes/s3/index.tssrc/http/routes/tus/lifecycle.tssrc/internal/database/pool.tssrc/internal/monitoring/logger.tssrc/internal/monitoring/metrics.tssrc/internal/monitoring/otel-metrics.tssrc/internal/monitoring/otel-tracing.tssrc/start/server.ts
💤 Files with no reviewable changes (2)
- src/http/routes/tus/lifecycle.ts
- src/http/plugins/metrics.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- package.json
src/internal/database/pool.ts
Outdated
| collectInProgress = true | ||
|
|
||
| try { | ||
| const entries = [...tenantPools.entries()] |
There was a problem hiding this comment.
There are some conditions below where a pool wasn't added into the cache. Missing them is intentional/insignificant because they are short-lived?
src/internal/monitoring/metrics.ts
Outdated
| export const httpRequestsTotal = withMetricLabels( | ||
| 'http_requests_total', | ||
| 'counter', | ||
| export const httpRequestsTotal = registerMetric('http_requests_total', 'counter', () => |
There was a problem hiding this comment.
good to drop at this point?
| function stripTenantAttrs(attrs: Attributes): Attributes { | ||
| const { tenantId, tenant_id, ...rest } = attrs as Record<string, unknown> | ||
| return rest as Attributes | ||
| } |
There was a problem hiding this comment.
is it worth of checking the config and not adding in the first place?
There was a problem hiding this comment.
yes, but it would be too manny if in too many places
There was a problem hiding this comment.
yes, it will be a bit messier since will clutter usage. We can do it as a follow up (if needed)
|
|
||
| const attributes = { | ||
| method, | ||
| route, |
There was a problem hiding this comment.
what is the reason? cardinality + covered by operation?
sounds good but there are still references to it in dashboard though
There was a problem hiding this comment.
Yes, cardinality, we do use the operation only, i don't think we reference the route
d696c92 to
092b090
Compare
ferhatelmas
left a comment
There was a problem hiding this comment.
Before merge:
- ensure dashboard doesn't refer dropped metric or labels
Later:
- handle non-cached pool metrics
- bring tenant label addition to metric recording site to prevent allocation (if needed)
What kind of change does this PR introduce?
Improvement
What is the new behavior?
@opentelemetry/instrumentation-fastifytohttps://www.npmjs.com/package/@fastify/otel