Skip to content

fix(promql): union histogram + sum scans for _count/_sum suffixed metrics#716

Merged
tsouza merged 1 commit into
mainfrom
fix/promql-suffix-count-sum-multi-table
May 22, 2026
Merged

fix(promql): union histogram + sum scans for _count/_sum suffixed metrics#716
tsouza merged 1 commit into
mainfrom
fix/promql-suffix-count-sum-multi-table

Conversation

@tsouza
Copy link
Copy Markdown
Owner

@tsouza tsouza commented May 22, 2026

Summary

Direct follow-up to PR #710. The classic-histogram companion-suffix
routing introduced there forced the scan to the histogram table only,
returning empty for OTel-hostmetrics counters that ship under suffixed
names in the sum table — system_cpu_logical_count,
system_processes_count, system_filesystem_inodes_count,
system_processes_created_count, etc.

  • schema.Metrics.TablesFor now returns [Histogram, Sum] for
    _count / _sum suffixed names (unchanged for _total / _bucket).
  • New chplan.UnionAll node (N-way arms, emits as
    (SELECT ...) UNION ALL (SELECT ...)) wired through the chsql
    emitter + test/spec/chplan_print.go IR snapshot helper.
  • lowerVectorSelector builds the union when both physical layouts
    apply: the histogram arm filters on the BARE name and projects
    toFloat64(Count|Sum) AS Value with MetricName synthesised as the
    suffixed literal; the sum arm filters on the SUFFIXED name and
    passes Value through. Single-arm fallback (no Sum, or Sum equals
    Histogram by config) preserves PR fix(promql): union (gauge, sum) scan so OTel-emitter cumulative sums under bare names resolve #710's byte-stable shape.

The existing merge() table-function path can't fan disjoint-schema
tables (histogram has Count/Sum but no Value; sum has Value but
no Count), so the per-arm UnionAll is the only way to address both
physical layouts in one query.

Live-stack verification

Compose stack at this branch's HEAD, with system_cpu_logical_count
data present in otel_metrics_sum:

  • Before fix: curl ... 'query=system_cpu_logical_count'
    {"status":"success","data":{"resultType":"vector","result":[]}}
  • After fix: returns [{"metric":{"__name__":"system_cpu_logical_count"},"value":[1779485700,"8"]}]

Pre-existing histogram-companion queries (cerberus_queries_duration_seconds_count,
http_server_request_duration_count) still resolve correctly. Pre-existing
unsuffixed cases (otelcol_process_uptime) still resolve correctly.

Test plan

  • go test ./... — 4105 passed across 38 packages
  • go test ./test/spec/promql/ -tags=chdb — 269 passed (chDB round-trip exercises the new fixture's union shape)
  • golangci-lint run ./... — no issues
  • New TXTAR fixture test/spec/promql/scan_unions_histogram_sum_for_suffixed_metric.txtar pins the multi-table union shape end-to-end (plan IR + SQL + chDB roundtrip with an empty histogram table + populated sum table — a regression that dropped the sum arm returns zero rows).
  • Updated TestLower_HistogramCompanion_RoutesToHistogramAndSumUnion covers all four _count / _sum shapes plus the OTel-hostmetrics-style system_cpu_logical_count case.
  • Updated TestDefaultOTelMetricsTablesFor covers the new [Histogram, Sum] candidate set with explicit OTel-hostmetrics names.
  • Existing fixtures histogram_count_basic / histogram_sum_basic / histogram_mean_latency / rate_http_count regenerated to reflect the new two-arm union plan.
  • chplan.UnionAll Equal-invariant coverage added under internal/chplan/equal_invariants_test.go (positive + 4 negatives: length, content, order, type).
  • Live compose-stack verification: system_cpu_logical_count resolves end-to-end through the cerberus Prom API at the new image.

…rics

PR #710's classic-histogram companion-suffix routing forced the scan to
the histogram table only, returning empty for OTel-hostmetrics counters
that ship under suffixed names in the sum table
(system_cpu_logical_count, system_processes_count,
system_filesystem_inodes_count, system_processes_created_count, ...).

schema.Metrics.TablesFor now returns [Histogram, Sum] for _count/_sum
suffixes. The lowering emits a chplan.UnionAll of two per-arm Projects:
the histogram arm filters on the BARE name and projects toFloat64(Count
or Sum) AS Value with MetricName synthesized as the suffixed literal;
the sum arm filters on the SUFFIXED name and passes Value through. The
existing merge() table-function path can't fan disjoint-schema tables
(histogram has Count/Sum but no Value; sum has Value but no Count), so
the per-arm UnionAll is the only way to address both physical layouts
in one query.

Verified pre-fix empty and post-fix non-empty on the live compose
stack against system_cpu_logical_count.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@tsouza tsouza enabled auto-merge (squash) May 22, 2026 22:00
@tsouza tsouza merged commit fac4d3f into main May 22, 2026
24 checks passed
@tsouza tsouza deleted the fix/promql-suffix-count-sum-multi-table branch May 22, 2026 22:08
tsouza added a commit that referenced this pull request May 22, 2026
… (#720)

PR #706 / #707 added per-arm canonical-shape projection +
instant-mode TimeUnix synthesis to fix SQL emission errors on
nested PromQL or-chains. The 2026-05-22 dirty-fixes audit asked
whether these helpers became unreachable after PR #710
(TableFor -> TablesFor for unsuffixed OTel sums) and PR #716
(extended to _count/_sum suffixes).

Reachability test on the post-#710/#716 tree lowered the
PR #706 failing-shape query and recorded
vectorSetOpCanonicalArmFrag invocations on both arms with
derived=true and the synthesised anchor branch taken. The
workarounds remain on the live emit path because #710/#716
fix which Scan tables get resolved (orthogonal to UNION ALL
column-type unification + missing-identifier-on-derived-inner).

Adds a NOTE block to vectorSetOpCanonicalArmFrag's docstring
documenting the audit so a future reviewer does not delete
the helpers on the assumption they were per-shape patches
for the gauge/sum routing bug.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant