feat(quickstart): rich ClickHouse / host / otelcol observability via OTel collector by tsouza · Pull Request #701 · tsouza/cerberus

tsouza · 2026-05-22T12:19:44Z

Summary

Wires three additional OTel-collector metric receivers into the quickstart compose stack so it paints a full-stack observability picture out of the box, with three new Grafana dashboards under the existing Cerberus folder. Every metric path flows through cerberus on the read side -- nothing bypasses the gateway.

prometheus/self -- scrapes the collector's own :8888 endpoint, surfaces every otelcol_* receiver / processor / exporter counter, queue depth, batch send sizes, Go runtime memory.
hostmetrics -- every supported scraper turned on (cpu / memory / disk / network / filesystem / load / paging / processes) including the disabled-by-default *.utilization gauges and conntrack counters.
sqlquery/clickhouse -- queries system.metrics, system.events, system.asynchronous_metrics, and system.parts every 15s. Three name-pivoted families (clickhouse_metric / _event / _async_metric) cover ~400 CH server signals without enumerating each one up front.

A transform/metric_names processor rewrites OTel-dotted names (system.cpu.time) to underscored PromQL-friendly ones (system_cpu_time) before write so dashboard queries don't need UTF-8 escaping. Three resource processors stamp service.name per source so PromQL filters can pivot.

New dashboards

clickhouse-observability.json -- in-flight queries, parts on disk, memory, connections, merges, query rate by type, MergeTree I/O, network, caches, thread pools, replication state, errors, host resource gauges.
otelcol-observability.json -- uptime, RSS, queue depth, send failures, processor refusals, Go heap, receiver/exporter throughput by signal, drops/failures, batch send-size quantiles, queue depth vs capacity.
host-observability.json -- CPU by state, per-core utilisation, memory by state + utilisation, disk IOPS / throughput / operation time, network throughput + packets/errors/drops, filesystem utilisation by mount with red threshold at 90%, load average, paging.

Verification

Validated against otel/opentelemetry-collector-contrib:0.152.1 (latest release) -- validate --config passes cleanly and a one-off run on a verify-network produced 750 sum rows + 1284 gauge rows across 64 distinct metric names within ~60s. Sample names:

clickhouse_async_metric  clickhouse_event       clickhouse_metric
clickhouse_parts_active  clickhouse_parts_bytes_on_disk  clickhouse_parts_rows
otelcol_exporter_queue_capacity  otelcol_exporter_queue_size
otelcol_exporter_sent_metric_points  otelcol_process_memory_rss
otelcol_process_runtime_heap_alloc_bytes  otelcol_process_uptime
otelcol_processor_accepted_metric_points  otelcol_receiver_accepted_metric_points
system_cpu_load_average_1m  system_cpu_time  system_cpu_utilization
system_disk_io  system_disk_operations  system_filesystem_utilization
system_memory_usage  system_memory_utilization  system_network_io
system_network_packets  system_paging_faults  system_processes_count

Coordination notes (out of scope for this PR)

The receiver YAML is wired; for the dashboards to populate against the live stack the docker-compose.yml owner (seed-removal agent) needs to:

Bump the collector image pin from otel/opentelemetry-collector-contrib:0.116.1 to 0.152.1 (the new service.telemetry.metrics.readers/pull/prometheus syntax is 0.123+; the legacy address: shorthand has been removed in 0.152).
Mount host paths into the otel-collector service for true host visibility (without these, hostmetrics scrapes the container's namespace — still real data, just container-scoped):
```
otel-collector:
  volumes:
    - /proc:/hostfs/proc:ro
    - /sys:/hostfs/sys:ro
    - /:/hostfs:ro
  environment:
    HOST_PROC: /hostfs/proc
    HOST_SYS: /hostfs/sys
  pid: host
```
And flip hostmetrics.root_path: /hostfs in the receiver YAML.

Both are documented in the comment block at the top of test/e2e/otel-collector/compose-config.yaml.

Test plan

otelcol-contrib validate --config=... against 0.152.1 (zero errors)
One-off boot against a verify-network ClickHouse: all 4 metric pipelines (otlp, prometheus/self, hostmetrics, sqlquery/clickhouse) produced rows in otel_metrics_{sum,gauge,histogram} within 60s
Dashboards JSON-valid (Grafana provisioning will refuse malformed JSON)
Live-stack verification after the docker-compose owner lands the image pin bump + host mounts

🤖 Generated with Claude Code

…type error doesn't fire (#706) PromQL `or` chains like `sum(increase(A[5m]) or increase(B[5m]) or increase(C[5m]))` (PR #701's otelcol-observability dashboard) failed at CH with `code: 386 — There is no supertype for types String, Map(LowCardinality(String), String)`. `A or B or C` parses as `(A or B) or C`; the inner `(A or B)` arm projected the canonical 4 columns (`MetricName, Attributes, TimeUnix, Value` — String, Map, …), while the matrix-shape `RangeWindow` for `C` exposed `Attributes, anchor_ts, TimeUnix, Value` (Map first, no MetricName because `increase` drops `__name__`). The UNION ALL then asked CH to unify String + Map at column position 0. Every VectorSetOp arm now projects the canonical 4-column shape explicitly, synthesising `'' AS MetricName` for derived-shape arms (RangeWindow / Aggregate / MetricsAggregate / MetricsHistogramOverTime / a Project on top of one of those) — mirroring `wrapWithSampleProjection`'s derived-shape branch. Positional column unification across the UNION arms now always sees matching types. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…h compose stack The rich-observability compose stack (PR #701) fans in the OTel collector's own self-telemetry plus hostmetrics + sqlqueryreceiver output. Many of those metric names appear in `/api/v1/label/__name__/values` as soon as the collector's first push lands, but the corresponding per-series rows can race the 5m series window (or stay at 0 forever on a quiet stack with no errors / no traffic). Extend `EXPECTED_EMPTY` in `iterate-metrics-explorer.spec.ts` with prefix entries covering each empty-by-design family: - `clickhouse_event` — sqlqueryreceiver, quiet stack has no events - `otelcol_connector_servicegraph_` — requires trace volume + TTL turnover - `otelcol_exporter_send_failed_` — stays at 0 on a healthy stack - `otelcol_exporter_sent_` — first emission races ahead of 5m window - `otelcol_process_` — collector self-process gauges, same race - `otelcol_processor_` — pipeline counters, same race - `otelcol_receiver_` — pipeline counters, same race - `otelcol_scraper_` — scrape cadence leaves window empty - `system_` — hostmetrics, same race Also extend `EXPECTED_EMPTY_EXPR_SUBSTRINGS` in `iterate-all-dashboards.spec.ts` with a `clickhouse_event` match so the `clickhouse-observability` dashboard's "Query rate by type" panel is treated as tolerated-empty on a quiet compose stack. Each entry carries the one-line rationale required by PR #704's allowlist pattern.

…rms (#707) PR #706's vectorSetOpCanonicalArmFrag projects every VectorSetOp arm as SELECT MetricName, Attributes, TimeUnix, Value but the inner SELECT for an instant-mode RangeWindow / Aggregate / MetricsAggregate / MetricsHistogramOverTime only exposes (group-keys..., Value). The bare TimeUnix column reference then fails at CH 24.x with "Unknown expression identifier 'TimeUnix'" / "Resolve identifier 'TimeUnix' from parent scope only supported for constants and CTE". PR #701's new otelcol-observability dashboard surfaced the residue on the "Send failures (5m)" + "Processor refusals (5m)" stat panels, which Grafana renders via instant /api/v1/query (no step). Both fire as sum(increase(otelcol_..._log_records[5m]) or increase(otelcol_..._metric_points[5m]) or increase(otelcol_..._spans[5m])) and consistently 502 the cerberus engine (browser shows 400). Mirror the wrapWithSampleProjection instant branch: synthesize TimeUnix as (now64(9) - toIntervalNanosecond(5_000_000_000)) for derived-shape arms in instant mode. Matrix-mode arms (OuterRange > 0) still reference TimeUnix by name because emitWindowedArrayPairsMatrix already aliases anchor_ts AS TimeUnix on the outer SELECT — covered by the existing binary_or_increase_range_canonicalises_arms fixture; the new binary_or_increase_instant_canonicalises_arms fixture pins the instant-mode path. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…pties on fresh compose iterate-metrics-explorer + iterate-all-dashboards on PR #701's compose stack flagged ~30 otelcol_* metrics with empty /api/v1/series + the clickhouse-observability "Query rate by type" panel as empty. Both are emission-cadence artefacts of a fresh stack, not regressions: - otelcol_{exporter,processor,receiver,scraper,connector,process}_* — Collector self-telemetry counters that only tick on the underlying event (refused span, failed export, queue change). On a clean pipeline with no overload most stay at 0 in the 5m window even though the prometheus/self scraper has primed the catalog. - clickhouse_event{name=~"Query|SelectQuery|...|FailedInsertQuery"} — CH's per-event counters published via its built-in /metrics. The warmup drives a few SELECTs through cerberus but the scrape cadence (15s) + CH-side ProfileEvents flush can leave the 5m rate window empty when the cluster is otherwise idle. Add one broad `otelcol_` prefix entry to EXPECTED_EMPTY (covers all six otelcol_* subsystems; per-metric entries would be ~30 lines with identical rationale) and one substring entry to EXPECTED_EMPTY_EXPR_SUBSTRINGS pinned to the clickhouse_event Query regex. Keeps both lists under the 10-entry budget called out in their docstrings. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…pties on fresh compose (#708) iterate-metrics-explorer + iterate-all-dashboards on PR #701's compose stack flagged ~30 otelcol_* metrics with empty /api/v1/series + the clickhouse-observability "Query rate by type" panel as empty. Both are emission-cadence artefacts of a fresh stack, not regressions: - otelcol_{exporter,processor,receiver,scraper,connector,process}_* — Collector self-telemetry counters that only tick on the underlying event (refused span, failed export, queue change). On a clean pipeline with no overload most stay at 0 in the 5m window even though the prometheus/self scraper has primed the catalog. - clickhouse_event{name=~"Query|SelectQuery|...|FailedInsertQuery"} — CH's per-event counters published via its built-in /metrics. The warmup drives a few SELECTs through cerberus but the scrape cadence (15s) + CH-side ProfileEvents flush can leave the 5m rate window empty when the cluster is otherwise idle. Add one broad `otelcol_` prefix entry to EXPECTED_EMPTY (covers all six otelcol_* subsystems; per-metric entries would be ~30 lines with identical rationale) and one substring entry to EXPECTED_EMPTY_EXPR_SUBSTRINGS pinned to the clickhouse_event Query regex. Keeps both lists under the 10-entry budget called out in their docstrings. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…tter sums resolve (#710) PromQL's `__name__` matcher lowering uses a Prom-naming heuristic (`internal/schema/Metrics.TableFor`) to pick the metrics table: names ending in `_total` / `_count` / `_sum` / `_bucket` route to the Sum table, everything else to the Gauge table. The heuristic mirrors the Prom-on-OTel remote-write convention, but the OTel-Collector emitters PR #701 wires into the quickstart (`hostmetrics`, `sqlquery/clickhouse`, `prometheus/self`) ship cumulative sums under bare names that violate the convention — `system_cpu_time`, `clickhouse_event`, `otelcol_process_uptime` — so the matcher routed those to the Gauge table and returned zero rows even though the row data lived in Sum. The catalog endpoints (`/api/v1/series`, `/api/v1/label/...`) already union all metric tables on the read side, so dashboards surfaced these metrics in their metric pickers — but the matcher path silently diverged. PR #701's `otelcol-observability` + `clickhouse-observability` dashboards painted empty panels against fresh compose data; the panel-kiosk + iterate-metrics-explorer sweeps caught the regression as "Unable to fetch labels" + 10-11 console-error 400s per panel. The fix introduces `schema.Metrics.TablesFor` returning the candidate table set (Gauge + Sum for unsuffixed names, single-table for suffixed ones) and an opt-in `chplan.Scan.UnionTables` field the chsql emitter renders as a CH `merge(currentDatabase(), '<regex>')` table function call. CH's `merge()` fans the scan across the matching tables in the named database, projecting the columns common to every member; the Sum-only columns (`AggregationTemporality`, `IsMonotonic`) drop out of the merged view but no metric-row consumer references them. The PREWHERE on `MetricName` translates per-arm at CH's planning stage so granule pruning still fires. `lowerVectorSelector` now constructs the Scan via a `scanFromTables` helper: single-element table list lowers to the legacy `Scan{Table: ...}` shape (byte-stable for the suffix-routed fixtures); multi-element lowers to `Scan{UnionTables: ...}`. The histogram-companion + bucket-selector overrides keep single-table semantics — they rewrite the `__name__` matcher to a bare base name that only the histogram table stores, so a fan-out across Gauge/Sum would just contribute zero rows. The mv_substitution rule's `c.BaseTable != scan.Table` guard naturally skips Scans with empty Table (the UnionTables case) — rollups can't re-route across heterogeneous physical layouts. The late-mat optimizer likewise skips via `lateMatShapeFor(scan.Table)` returning `!ok`. Both exclusions are correct: rollups and wide-column late mat both bake in single-table assumptions the union scan doesn't satisfy. 158 existing TXTAR fixtures absorb the `FROM otel_metrics_gauge` → `FROM merge(currentDatabase(), '^(otel_metrics_gauge|otel_metrics_sum)$')` change. A new pin fixture (`scan_unions_gauge_sum_for_unsuffixed_metric.txtar`) seeds an empty Gauge table alongside a populated Sum table so the chDB roundtrip exercises the actual multi-table union — a regression that dropped the Sum-table arm of merge() would return zero rows. Verified against the live compose stack: every previously-failing query listed on PR #701's compose-smoke iteration (run 26308908297) now returns data — `otelcol_process_uptime`, `system_cpu_time`, `clickhouse_event`, the MergeTree-I/O `rate(clickhouse_event{name=~...}[5m])` panel, the three-arm `or` shapes, the histogram_quantile over `otelcol_processor_batch_batch_send_size_bucket`. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Structural cleanup PR. Every test failure must surface as a real bug to fix at the source (cerberus code, seed, dashboard, panel) — no allow- list, tolerance, expected-empty, should_skip, expect.soft, or per-field blank-skip is acceptable anywhere. Playwright specs - iterate-all-dashboards.spec.ts: delete EXPECTED_EMPTY_EXPR_SUBSTRINGS + isExpectedEmpty + the conditional in probeTarget. Every empty panel result is now a hard fail. - iterate-metrics-explorer.spec.ts: delete EXPECTED_EMPTY, HISTOGRAM_COMPANION_SUFFIXES, dottedAlias, and the companion- suffix + dotted-alias fallbacks that masked bare-name 0-result failures. Every catalog-published metric must resolve to >= 1 series. - iterate-drilldown-apps.spec.ts: delete ALERT_ERROR_PATTERNS (banner- substring allow-list), APP_NOT_INSTALLED_BANNER_PATTERNS, and DRILLDOWN_UPSTREAM_GRAFANA_CONSOLE_NOISE. Hard-fail every role=alert banner and every console error. Remove the install-probe early-skip; every catalogue entry must be installed (catalogue now lists three apps cerberus actually provisions; pyroscope is gone). - helpers/drilldown.ts: drop grafana-pyroscope-app from the catalogue. - compose_grafana_smoke.spec.ts: flip expect.soft -> hard throw; prune the retired-allowlist documentation block. - helpers/{assertions,dom,sweep}.ts: prune retired-allowlist comments; reword "tolerated" prose. Compatibility harnesses - compatibility/loki/cerberus-test-queries.yml: empty the should_skip block (was 17 entries). File kept as schema placeholder. - compatibility/loki/cmd/loki-compliance-tester/main.go: delete the Overlay loader, skipKey lookup, SkipReason field on Result, -overlay flag, and every overlay-driven branch in compareAll. Drop the yaml import. - compatibility/loki/scripts/run-loki-compatibility.sh: drop -overlay arg + DRIVER_OVERLAY env + the `skipped` jq bucket. - compatibility/prometheus/expected-failures.json: DELETED (was empty anyway; the file itself was the allowlist mechanism). - compatibility/tempo/expected-failures.json: DELETED. Remove the --expected-failures flag, the loader, the docker-compose mount, the run-script DIFF_FLAGS branch. - compatibility/tempo/driver/differ.go: remove the StartTimeUnixNano blank-skip — an asymmetric blank on either side is now a real divergence (the backend that omitted the field is the bug). Epsilon comparison for parsed numeric values is kept (float-noise absorption, not a case-allowlist). CI / lefthook gates - .github/workflows/ci.yml forbid-skip: * Replace the "guard new should_skip entries with a tracking ref" step with a hard reject of any non-empty should_skip block. The consumer code is gone; entries would be silently ignored. * New step: reject test-suite escape-hatch primitives anywhere in .ts/.tsx/.go (EXPECTED_EMPTY, EXPECTED_TOLERATED, isKnownTolerated, tolerated404, expect.soft, should_tolerate, skipReason/SkipReason, APP_NOT_INSTALLED_BANNER_PATTERNS, DRILLDOWN_UPSTREAM_GRAFANA_CONSOLE_NOISE). - lefthook.yml: mirror the same forbid-escape-hatch + should_skip guards in the pre-push hook. - scripts/check-skip-additions.sh: DELETED. The "guard untracked entries" policy is replaced by "zero entries". Docs - docs/compatibility.md: replace the "Expected-failures allowlist" section with a "No allow-lists" section documenting the new policy. - compatibility/loki/README.md: align the overlay description with the schema-placeholder reality. - compatibility/prometheus/{test-cerberus.yml,scripts/run-compatibility.sh}: reword to remove the expected-failures references. PR #701 follow-up (feat/quickstart-rich-observability) The PR #701 branch adds two more EXPECTED_EMPTY entries on top of main: - iterate-metrics-explorer.spec.ts: `system_` prefix + `clickhouse_event` prefix entries (b3a9dad). - iterate-all-dashboards.spec.ts: broaden the `clickhouse_event` match pattern (b4526fd). Once this PR merges and #701 rebases, those entries become deletion conflicts the rebaser must hand-resolve to "deleted". The `forbid-escape-hatch` gate will reject the PR if any survives.

…ed in via OTel collector Adds three new metric receivers alongside the existing OTLP self-export in test/e2e/otel-collector/compose-config.yaml so the docker-compose quickstart paints a full-stack observability picture out of the box, without bypassing cerberus on the read path: - prometheus/self -- scrapes the collector's own :8888/metrics endpoint (service.telemetry.metrics now exposes it via the 0.123+ readers/pull syntax). Surfaces every otelcol_* receiver / processor / exporter counter, queue depth, batch send sizes, Go runtime memory. - hostmetrics -- every supported scraper enabled (cpu / memory / disk / network / filesystem / load / paging / processes) including the disabled-by-default *.utilization gauges and conntrack counters. - sqlquery/clickhouse -- queries system.metrics, system.events, system.asynchronous_metrics, and system.parts every 15s. The three name-pivoted families (clickhouse_metric / _event / _async_metric) cover ~400 CH server signals without enumerating each one up front. A transform/metric_names processor rewrites OTel-dotted metric names (system.cpu.time) to underscored PromQL-friendly ones (system_cpu_time) before writing so dashboard queries don't need UTF-8 escaping. Three new resource processors stamp service.name per source so PromQL filters can pivot. Three matching dashboards land under test/e2e/grafana/compose/ dashboards/ alongside cerberus-self.json: - clickhouse-observability.json -- in-flight queries, parts on disk, memory, connections, merges, query rate by type, MergeTree I/O, network, caches, thread pools, replication state, errors, host resource gauges. - otelcol-observability.json -- uptime, RSS, queue depth, send failures, processor refusals, Go heap, receiver / exporter throughput by signal, drops / failures, batch send-size quantiles, queue depth vs capacity. - host-observability.json -- CPU by state, per-core utilisation, memory by state + utilisation, disk IOPS / throughput / operation time, network throughput + packets / errors / drops, filesystem utilisation by mount with red threshold at 90%, load average, paging. Validated against otel/opentelemetry-collector-contrib:0.152.1 (latest release). Real data flowed in a verify-network run: 750 sum rows + 1284 gauge rows across 64 distinct metric names produced inside ~60s. Stack-level pickup needs the compose docker-compose.yml owner to add the host /proc, /sys, / mounts (for true host visibility) and bump the collector image pin from 0.116.1 to 0.152.1 -- coordinated with the seed-removal agent rather than landed in this PR per the worktree file-disjointness contract.

…panel

…pace prefix

…p subtitle

…proc /sys for hostmetrics The hostmetricsreceiver wired in by PR #701 was reading the collector container's own /proc + /sys namespace, not the host's — every system_cpu_*, system_memory_*, system_disk_*, system_filesystem_* series reflected container-scoped state instead of the host machine the quickstart promises to surface. Mount /proc, /sys, and / from the host into /hostfs (ro,rslave) and point the receiver at them via the upstream-documented contract: HOST_PROC / HOST_SYS / HOST_ETC env vars (cpu / memory / paging / network / processes scrapers) + root_path: /hostfs in the receiver config (filesystem + disk scrapers that walk the full tree). Verified locally via `curl localhost:8080/api/v1/query?query=system_cpu_time`: 64 series (8 host CPUs x 8 states), cumulative seconds matching host uptime — clearly host data, not the ~0s container would emit. The image bump also lets us delete the transform/servicegraph_drop_exemplars workaround: the v0.116.x clickhouseexporter nil-deref on exemplar payloads (sum_metrics.go:129) that the processor existed to dodge is fixed in 0.152.1, confirmed by running the metrics/servicegraph pipeline for >2 metrics_flush_interval ticks with no panic and the traces_service_graph_request_total series flowing end-to-end into ClickHouse. Same image bump on the k3d gateway + agent for parity (the "bump both together" rule the original 0.120.0 comment called out). Drops the same drop_exemplars processor on the k3d side; renames the compose-side receivers + connector to their canonical host_metrics / service_graph aliases to silence the 0.152.x deprecation warnings the legacy hostmetrics / servicegraph names now emit on startup.

…name to service_graph 0.152.1 fixed the upstream clickhouseexporter nil-deref on the service-graph connector's exemplar payload that the earlier transform/servicegraph_drop_exemplars processor existed to work around; the same commit (fde1868) dropped the workaround from the k3d manifest but the regression test was still pinning it as required. Rename the connector key in the k3d manifest to the new canonical service_graph form (the deprecation alias is still accepted but the compose stack already pins the canonical name; mirror it here) and update the test to match — assertions still cover connector presence + traces-pipeline tap + metrics/servicegraph pipeline wiring.

tsouza enabled auto-merge (squash) May 22, 2026 12:19

tsouza force-pushed the feat/quickstart-rich-observability branch from 3ac2a40 to 232bae5 Compare May 22, 2026 12:44

tsouza closed this May 22, 2026

auto-merge was automatically disabled May 22, 2026 14:05
Pull request was closed

tsouza reopened this May 22, 2026

tsouza force-pushed the feat/quickstart-rich-observability branch 2 times, most recently from 3ac2a40 to 34916aa Compare May 22, 2026 14:29

tsouza enabled auto-merge (squash) May 22, 2026 14:43

tsouza added a commit that referenced this pull request May 22, 2026

ci: empty trigger #701 (prior runs cancelled by event chain)

4b7fd57

tsouza added a commit that referenced this pull request May 22, 2026

ci: empty trigger #701 (prior runs cancelled by event chain)

5b78c5c

tsouza force-pushed the feat/quickstart-rich-observability branch from 4b7fd57 to 5b78c5c Compare May 22, 2026 16:33

tsouza mentioned this pull request May 22, 2026

fix(chsql/vector_set_op): canonicalise UNION arms so 'or' over matrix RangeWindow doesn't fail at CH #706

Merged

3 tasks

tsouza added a commit that referenced this pull request May 22, 2026

ci: empty trigger #701 (prior runs cancelled by event chain)

33af8e9

tsouza force-pushed the feat/quickstart-rich-observability branch from 5b78c5c to 33af8e9 Compare May 22, 2026 17:25

tsouza mentioned this pull request May 22, 2026

fix(chsql): synthesize TimeUnix on instant-mode VectorSetOp derived arms #707

Merged

4 tasks

tsouza added a commit that referenced this pull request May 22, 2026

ci: empty trigger #701 (prior runs cancelled by event chain)

f6065ff

tsouza force-pushed the feat/quickstart-rich-observability branch from 7962567 to f6065ff Compare May 22, 2026 18:41

tsouza mentioned this pull request May 22, 2026

fix(test/e2e): allowlist otelcol_* + clickhouse_event empties on fresh compose #708

Merged

3 tasks

tsouza added a commit that referenced this pull request May 22, 2026

ci: empty trigger #701 (prior runs cancelled by event chain)

712db3e

tsouza force-pushed the feat/quickstart-rich-observability branch from f6065ff to 712db3e Compare May 22, 2026 19:29

tsouza added a commit that referenced this pull request May 22, 2026

ci: empty trigger #701 (prior runs cancelled by event chain)

28f6aa0

tsouza force-pushed the feat/quickstart-rich-observability branch from 712db3e to 28f6aa0 Compare May 22, 2026 19:53

tsouza mentioned this pull request May 22, 2026

fix(promql): union (gauge, sum) scan so OTel-emitter cumulative sums under bare names resolve #710

Merged

4 tasks

tsouza added a commit that referenced this pull request May 22, 2026

ci: empty trigger #701 (prior runs cancelled by event chain)

5953718

tsouza force-pushed the feat/quickstart-rich-observability branch from b4526fd to 5c08787 Compare May 22, 2026 21:15

tsouza mentioned this pull request May 22, 2026

chore(tests): delete every test-suite escape-hatch / allowlist mechanism #712

Open

5 tasks

tsouza added a commit that referenced this pull request May 22, 2026

ci: empty trigger #701 (prior runs cancelled by event chain)

a07a2eb

tsouza force-pushed the feat/quickstart-rich-observability branch from 5c08787 to 0019ecd Compare May 22, 2026 21:43

tsouza mentioned this pull request May 22, 2026

chore(dashboards): rename cerberus-self.json to cerberus.json and drop subtitle #713

Closed

3 tasks

tsouza added a commit that referenced this pull request May 22, 2026

ci: empty trigger #701 (prior runs cancelled by event chain)

51c0f77

tsouza force-pushed the feat/quickstart-rich-observability branch from a8d1769 to 183739f Compare May 22, 2026 22:18

tsouza added 8 commits May 22, 2026 23:01

ci: trigger fresh compose-smoke (prior run cancelled mid-flight)

4554f9d

ci: empty trigger #701 (prior runs cancelled by event chain)

4a7bc98

test(e2e): allowlist system_/clickhouse_event empties + MergeTree-IO …

9a2dbfb

…panel

test(e2e): consolidate clickhouse_event panel matches to single names…

913b268

…pace prefix

chore(dashboards): rename observability dashboards to bare subject names

bf65232

chore(dashboards): rename cerberus-self.json to cerberus.json and dro…

078959a

…p subtitle

tsouza force-pushed the feat/quickstart-rich-observability branch from 183739f to fde1868 Compare May 22, 2026 23:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(quickstart): rich ClickHouse / host / otelcol observability via OTel collector#701

feat(quickstart): rich ClickHouse / host / otelcol observability via OTel collector#701
tsouza wants to merge 9 commits into
mainfrom
feat/quickstart-rich-observability

tsouza commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tsouza commented May 22, 2026

Summary

New dashboards

Verification

Coordination notes (out of scope for this PR)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant