Skip to content

fix(chsql/vector_set_op): canonicalise UNION arms so 'or' over matrix RangeWindow doesn't fail at CH#706

Merged
tsouza merged 1 commit into
mainfrom
fix/promql-or-union-attribute-supertype
May 22, 2026
Merged

fix(chsql/vector_set_op): canonicalise UNION arms so 'or' over matrix RangeWindow doesn't fail at CH#706
tsouza merged 1 commit into
mainfrom
fix/promql-or-union-attribute-supertype

Conversation

@tsouza
Copy link
Copy Markdown
Owner

@tsouza tsouza commented May 22, 2026

Summary

  • Failing job 77421909731 (PR feat(quickstart): rich ClickHouse / host / otelcol observability via OTel collector #701 compose-smoke) surfaced three PromQL or queries on the new otelcol-observability dashboard returning 502s with code: 386, message: There is no supertype for types String, Map(LowCardinality(String), String) because some of them are Maps and some of them are not.
  • Root cause: A or B or C parses as (A or B) or C. The inner (A or B) VectorSetOr arm projected the canonical 4-column shape (MetricName, Attributes, TimeUnix, Value — String, Map, DateTime64, Float64); the outer right arm C was a matrix-shape RangeWindow whose SELECT * exposed Attributes, anchor_ts, TimeUnix, Value (Map first, no MetricName because increase drops __name__). The UNION ALL then asked CH to unify String + Map at column position 0.
  • Fix in internal/chsql/vector_set_op.go: every VectorSetOp arm now explicitly projects the canonical 4-column shape via a new vectorSetOpCanonicalArmFrag helper, synthesising '' AS MetricName for derived-shape arms (RangeWindow / Aggregate / MetricsAggregate / MetricsHistogramOverTime / a Project on top of one of those) — mirroring wrapWithSampleProjection's derived-shape branch in internal/api/prom/handler.go. Positional column unification across the UNION arms now always sees matching column types.
  • Regression fixture test/spec/promql/binary_or_increase_range_canonicalises_arms.txtar pins the failing 3-arm sum(increase(...)[5m] or increase(...)[5m] or increase(...)[5m]) shape with range_step: 1m; the chplan shows the nested VectorSetOp over matrix RangeWindow arms that the bug surfaced.
  • Existing binary_{and,or,unless}{,_ignoring,_on}.txtar SQL goldens regenerated for the canonical-arm wrap; behaviour is unchanged (the canonical-shape arms already exposed MetricName, the added projection is a passthrough CH's optimizer fuses).

Test plan

  • go test ./internal/promql/ ./internal/chsql/ ./internal/api/prom/ ./internal/api/loki/ ./internal/api/tempo/ ./internal/optimizer/ — 2325 tests pass.
  • New binary_or_increase_range_canonicalises_arms.txtar pins the regressing shape and its post-fix SQL.
  • Compose-smoke otelcol-observability dashboard sweep clears (PR-side CI rerun).

🤖 Generated with Claude Code

…type error doesn't fire

PromQL `or` chains like `sum(increase(A[5m]) or increase(B[5m]) or
increase(C[5m]))` (PR #701's otelcol-observability dashboard) failed at
CH with `code: 386 — There is no supertype for types String,
Map(LowCardinality(String), String)`. `A or B or C` parses as `(A or
B) or C`; the inner `(A or B)` arm projected the canonical 4 columns
(`MetricName, Attributes, TimeUnix, Value` — String, Map, …), while
the matrix-shape `RangeWindow` for `C` exposed `Attributes, anchor_ts,
TimeUnix, Value` (Map first, no MetricName because `increase` drops
`__name__`). The UNION ALL then asked CH to unify String + Map at
column position 0.

Every VectorSetOp arm now projects the canonical 4-column shape
explicitly, synthesising `'' AS MetricName` for derived-shape arms
(RangeWindow / Aggregate / MetricsAggregate / MetricsHistogramOverTime
/ a Project on top of one of those) — mirroring
`wrapWithSampleProjection`'s derived-shape branch. Positional column
unification across the UNION arms now always sees matching types.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@tsouza tsouza enabled auto-merge (squash) May 22, 2026 17:14
@tsouza tsouza merged commit a5194cb into main May 22, 2026
22 checks passed
@tsouza tsouza deleted the fix/promql-or-union-attribute-supertype branch May 22, 2026 17:23
tsouza added a commit that referenced this pull request May 22, 2026
…rms (#707)

PR #706's vectorSetOpCanonicalArmFrag projects every VectorSetOp arm as
SELECT MetricName, Attributes, TimeUnix, Value but the inner SELECT for
an instant-mode RangeWindow / Aggregate / MetricsAggregate /
MetricsHistogramOverTime only exposes (group-keys..., Value). The bare
TimeUnix column reference then fails at CH 24.x with
"Unknown expression identifier 'TimeUnix'" / "Resolve identifier
'TimeUnix' from parent scope only supported for constants and CTE".

PR #701's new otelcol-observability dashboard surfaced the residue
on the "Send failures (5m)" + "Processor refusals (5m)" stat panels,
which Grafana renders via instant /api/v1/query (no step). Both fire
as
  sum(increase(otelcol_..._log_records[5m])
      or increase(otelcol_..._metric_points[5m])
      or increase(otelcol_..._spans[5m]))
and consistently 502 the cerberus engine (browser shows 400).

Mirror the wrapWithSampleProjection instant branch: synthesize
TimeUnix as (now64(9) - toIntervalNanosecond(5_000_000_000)) for
derived-shape arms in instant mode. Matrix-mode arms (OuterRange > 0)
still reference TimeUnix by name because emitWindowedArrayPairsMatrix
already aliases anchor_ts AS TimeUnix on the outer SELECT — covered by
the existing binary_or_increase_range_canonicalises_arms fixture; the
new binary_or_increase_instant_canonicalises_arms fixture pins the
instant-mode path.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
tsouza added a commit that referenced this pull request May 22, 2026
… (#720)

PR #706 / #707 added per-arm canonical-shape projection +
instant-mode TimeUnix synthesis to fix SQL emission errors on
nested PromQL or-chains. The 2026-05-22 dirty-fixes audit asked
whether these helpers became unreachable after PR #710
(TableFor -> TablesFor for unsuffixed OTel sums) and PR #716
(extended to _count/_sum suffixes).

Reachability test on the post-#710/#716 tree lowered the
PR #706 failing-shape query and recorded
vectorSetOpCanonicalArmFrag invocations on both arms with
derived=true and the synthesised anchor branch taken. The
workarounds remain on the live emit path because #710/#716
fix which Scan tables get resolved (orthogonal to UNION ALL
column-type unification + missing-identifier-on-derived-inner).

Adds a NOTE block to vectorSetOpCanonicalArmFrag's docstring
documenting the audit so a future reviewer does not delete
the helpers on the assumption they were per-shape patches
for the gauge/sum routing bug.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant