Skip to content

fix(promql): union (gauge, sum) scan so OTel-emitter cumulative sums under bare names resolve#710

Merged
tsouza merged 1 commit into
mainfrom
fix/promql-sum-metric-no-total-suffix
May 22, 2026
Merged

fix(promql): union (gauge, sum) scan so OTel-emitter cumulative sums under bare names resolve#710
tsouza merged 1 commit into
mainfrom
fix/promql-sum-metric-no-total-suffix

Conversation

@tsouza
Copy link
Copy Markdown
Owner

@tsouza tsouza commented May 22, 2026

Summary

cerberus's PromQL matcher path used a Prom-naming heuristic
(internal/schema/Metrics.TableFor) to pick the metrics table:
_total/_count/_sum/_bucket suffix → Sum table, everything else
→ Gauge table. The heuristic mirrors the Prom-on-OTel remote-write
convention, but the OTel-Collector emitters PR #701 wires into the
quickstart (hostmetrics, sqlquery/clickhouse, prometheus/self)
ship cumulative sums under bare names that violate the convention
(system_cpu_time, clickhouse_event, otelcol_process_uptime,
…), so the matcher routed them to the Gauge table and returned zero
rows even though the row data lived in Sum.

The catalog endpoints (/api/v1/series, /api/v1/label/...) already
union all metric tables on the read side, so dashboards surfaced these
metrics in their pickers — but the matcher path silently diverged.
PR #701's otelcol-observability + clickhouse-observability
dashboards painted empty panels against fresh compose data; the
panel-kiosk + iterate-metrics-explorer sweeps caught the regression
as "Unable to fetch labels" + 10–11 console-error 400s per panel.

Root cause

internal/schema/otel.go::TableFor decides which CH table to scan for
a PromQL metric matcher based purely on the metric-name suffix. Cerberus's
self-telemetry happens to ship _total suffixes (so it works), but
upstream OTel emitters don't follow that convention — hostmetrics
emits system_cpu_time (Sum, no suffix), sqlquery emits
clickhouse_event (Sum, no suffix), prometheus/self emits
otelcol_process_uptime (mixed Gauge/Sum under bare names). The
matcher path committed to a single table; the read returned empty.

Prior PRs (#699, #706, #707, #708) each band-aided one symptom of
this single bug (labels endpoint, or-arm union, TimeUnix, allowlist
otelcol_* empties). All of them left the matcher-resolve heuristic
untouched.

Fix

  • schema.Metrics.TablesFor returns the candidate table set: single
    table for suffix-routed names (preserves byte-stable SQL on the
    existing fixtures), [Gauge, Sum] for unsuffixed names.
  • chplan.Scan.UnionTables is a new opt-in field — when set, the
    chsql emitter renders the scan as a CH merge(currentDatabase(), '<regex>') table function call. CH's merge() fans the scan across
    the matching tables in the named database; the PREWHERE on
    MetricName translates per-arm at planning time so granule pruning
    still fires. Verified via EXPLAIN.
  • lowerVectorSelector picks the Scan shape via a scanFromTables
    helper: 1-element tables → legacy Scan{Table: …} shape (byte-stable
    for the suffix-routed fixtures); multi-element → Scan{UnionTables: …}. The histogram-companion + bucket-selector overrides keep
    single-table semantics.

Test plan

Fixture churn

158 existing TXTAR fixtures absorb the FROM otel_metrics_gauge
FROM merge(currentDatabase(), '^(otel_metrics_gauge|otel_metrics_sum)$')
change. The chplan IR pretty-printer (test/spec/chplan_print.go) now
renders Scan(merge(t1|t2)) for union scans.

Follow-up

Suffix-routed Sum metrics that happen to end in _count/_sum but
aren't histogram companions (system_cpu_logical_count,
system_processes_count etc.) still route to the histogram table via
HistogramCompanionColumn and return empty. The fix here is the same
shape (union the histogram-companion path with a fallback Sum scan
under the original name) — separate PR.

🤖 Generated with Claude Code

…tter sums resolve

PromQL's `__name__` matcher lowering uses a Prom-naming heuristic
(`internal/schema/Metrics.TableFor`) to pick the metrics table: names
ending in `_total` / `_count` / `_sum` / `_bucket` route to the Sum
table, everything else to the Gauge table. The heuristic mirrors the
Prom-on-OTel remote-write convention, but the OTel-Collector emitters
PR #701 wires into the quickstart (`hostmetrics`, `sqlquery/clickhouse`,
`prometheus/self`) ship cumulative sums under bare names that violate
the convention — `system_cpu_time`, `clickhouse_event`,
`otelcol_process_uptime` — so the matcher routed those to the Gauge
table and returned zero rows even though the row data lived in Sum.

The catalog endpoints (`/api/v1/series`, `/api/v1/label/...`) already
union all metric tables on the read side, so dashboards surfaced these
metrics in their metric pickers — but the matcher path silently
diverged. PR #701's `otelcol-observability` + `clickhouse-observability`
dashboards painted empty panels against fresh compose data; the
panel-kiosk + iterate-metrics-explorer sweeps caught the regression as
"Unable to fetch labels" + 10-11 console-error 400s per panel.

The fix introduces `schema.Metrics.TablesFor` returning the candidate
table set (Gauge + Sum for unsuffixed names, single-table for suffixed
ones) and an opt-in `chplan.Scan.UnionTables` field the chsql emitter
renders as a CH `merge(currentDatabase(), '<regex>')` table function
call. CH's `merge()` fans the scan across the matching tables in the
named database, projecting the columns common to every member; the
Sum-only columns (`AggregationTemporality`, `IsMonotonic`) drop out of
the merged view but no metric-row consumer references them. The
PREWHERE on `MetricName` translates per-arm at CH's planning stage so
granule pruning still fires.

`lowerVectorSelector` now constructs the Scan via a `scanFromTables`
helper: single-element table list lowers to the legacy
`Scan{Table: ...}` shape (byte-stable for the suffix-routed fixtures);
multi-element lowers to `Scan{UnionTables: ...}`. The histogram-companion
+ bucket-selector overrides keep single-table semantics — they rewrite
the `__name__` matcher to a bare base name that only the histogram
table stores, so a fan-out across Gauge/Sum would just contribute zero
rows.

The mv_substitution rule's `c.BaseTable != scan.Table` guard naturally
skips Scans with empty Table (the UnionTables case) — rollups can't
re-route across heterogeneous physical layouts. The late-mat optimizer
likewise skips via `lateMatShapeFor(scan.Table)` returning `!ok`. Both
exclusions are correct: rollups and wide-column late mat both bake in
single-table assumptions the union scan doesn't satisfy.

158 existing TXTAR fixtures absorb the `FROM otel_metrics_gauge` →
`FROM merge(currentDatabase(), '^(otel_metrics_gauge|otel_metrics_sum)$')`
change. A new pin fixture (`scan_unions_gauge_sum_for_unsuffixed_metric.txtar`)
seeds an empty Gauge table alongside a populated Sum table so the chDB
roundtrip exercises the actual multi-table union — a regression that
dropped the Sum-table arm of merge() would return zero rows.

Verified against the live compose stack: every previously-failing query
listed on PR #701's compose-smoke iteration (run 26308908297) now
returns data — `otelcol_process_uptime`, `system_cpu_time`,
`clickhouse_event`, the MergeTree-I/O `rate(clickhouse_event{name=~...}[5m])`
panel, the three-arm `or` shapes, the histogram_quantile over
`otelcol_processor_batch_batch_send_size_bucket`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@tsouza tsouza merged commit e1007b5 into main May 22, 2026
22 checks passed
@tsouza tsouza deleted the fix/promql-sum-metric-no-total-suffix branch May 22, 2026 21:14
tsouza added a commit that referenced this pull request May 22, 2026
…rics (#716)

PR #710's classic-histogram companion-suffix routing forced the scan to
the histogram table only, returning empty for OTel-hostmetrics counters
that ship under suffixed names in the sum table
(system_cpu_logical_count, system_processes_count,
system_filesystem_inodes_count, system_processes_created_count, ...).

schema.Metrics.TablesFor now returns [Histogram, Sum] for _count/_sum
suffixes. The lowering emits a chplan.UnionAll of two per-arm Projects:
the histogram arm filters on the BARE name and projects toFloat64(Count
or Sum) AS Value with MetricName synthesized as the suffixed literal;
the sum arm filters on the SUFFIXED name and passes Value through. The
existing merge() table-function path can't fan disjoint-schema tables
(histogram has Count/Sum but no Value; sum has Value but no Count), so
the per-arm UnionAll is the only way to address both physical layouts
in one query.

Verified pre-fix empty and post-fix non-empty on the live compose
stack against system_cpu_logical_count.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
tsouza added a commit that referenced this pull request May 22, 2026
PR #710 added validateScanShape, mergeTableFrag, regexQuoteMeta and the
UnionTables shape-lookup at emit_node.go:341 — gremlins phase2 dropped
from green to 94.59% with 11 LIVED mutants on the new code. Most of
them are equivalent (boundary mutants on len(...) > 0 sites whose
called builder methods are no-ops on empty input), but the
validateScanShape mutual-exclusion and the emitFilterScan shape-key
selection are reachable.

Adds targeted tests:
  - TestValidateScanShape_BothEmpty / _BothSet (+ FilterScan variants)
    kill L86/L89 CONDITIONALS_NEGATION by asserting Emit(Scan{}) and
    Emit(Scan{Table+UnionTables}) error out synchronously.
  - TestEmitFilterScan_UnionTablesShapeLookup kills L341 col-14 and
    col-45 negations by asserting PREWHERE promotion fires against the
    UnionTables[0] metric-shape lookup.
  - TestRegexQuoteMeta_EscapesEachMetacharacter pins the RE2 metachar
    escape set used in the merge() table-function regex argument.
  - TestMergeTableFrag_DefaultDatabase / _ExplicitDatabase pin the
    merge() rendering for both Database = "" and Database = "obs".

Local mutation: each kill verified by hand-mutating emit_node.go and
re-running the new test. Expected efficacy after CI re-run: ≥ 95%.
tsouza added a commit that referenced this pull request May 22, 2026
… (#720)

PR #706 / #707 added per-arm canonical-shape projection +
instant-mode TimeUnix synthesis to fix SQL emission errors on
nested PromQL or-chains. The 2026-05-22 dirty-fixes audit asked
whether these helpers became unreachable after PR #710
(TableFor -> TablesFor for unsuffixed OTel sums) and PR #716
(extended to _count/_sum suffixes).

Reachability test on the post-#710/#716 tree lowered the
PR #706 failing-shape query and recorded
vectorSetOpCanonicalArmFrag invocations on both arms with
derived=true and the synthesised anchor branch taken. The
workarounds remain on the live emit path because #710/#716
fix which Scan tables get resolved (orthogonal to UNION ALL
column-type unification + missing-identifier-on-derived-inner).

Adds a NOTE block to vectorSetOpCanonicalArmFrag's docstring
documenting the audit so a future reviewer does not delete
the helpers on the assumption they were per-shape patches
for the gauge/sum routing bug.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant