chore(tests): delete every test-suite escape-hatch / allowlist mechanism by tsouza · Pull Request #712 · tsouza/cerberus

tsouza · 2026-05-22T21:30:37Z

Summary

Structural cleanup. Every test failure must surface as a real bug to fix at the source (cerberus code, seed, dashboard, panel) — no allow-list, tolerance, expected-empty, should_skip, expect.soft, or per-field blank-skip is acceptable anywhere.

Playwright specs

iterate-all-dashboards.spec.ts: delete EXPECTED_EMPTY_EXPR_SUBSTRINGS (5 entries) + isExpectedEmpty + the conditional in probeTarget. Every empty panel result is now a hard fail.
iterate-metrics-explorer.spec.ts: delete EXPECTED_EMPTY (6 entries), HISTOGRAM_COMPANION_SUFFIXES, dottedAlias, and the companion-suffix + dotted-alias fallbacks that masked bare-name 0-result failures. Every catalog-published metric must resolve to >= 1 series.
iterate-drilldown-apps.spec.ts: delete ALERT_ERROR_PATTERNS (banner-substring allow-list), APP_NOT_INSTALLED_BANNER_PATTERNS, and DRILLDOWN_UPSTREAM_GRAFANA_CONSOLE_NOISE. Hard-fail every role=alert banner and every console error. Remove the install-probe early-skip; every catalogue entry must be installed.
helpers/drilldown.ts: drop grafana-pyroscope-app from the catalogue (cerberus doesn't ship profiling).
compose_grafana_smoke.spec.ts: flip expect.soft -> hard throw; prune the retired-allowlist documentation blocks.
helpers/{assertions,dom,sweep}.ts: prune retired-allowlist comments; reword "tolerated" prose.

Compatibility harnesses

compatibility/loki/cerberus-test-queries.yml: empty the should_skip block (was 17 entries — see commit body for the per-entry rationale capture). File kept as schema placeholder.
compatibility/loki/cmd/loki-compliance-tester/main.go: delete the Overlay loader, skipKey lookup, SkipReason field on Result, -overlay flag, and every overlay-driven branch in compareAll. Drop the yaml import.
compatibility/loki/scripts/run-loki-compatibility.sh: drop -overlay arg + DRIVER_OVERLAY env + the skipped jq bucket.
compatibility/prometheus/expected-failures.json: deleted (was empty; the file itself was the allowlist mechanism).
compatibility/tempo/expected-failures.json: deleted. Remove the --expected-failures flag, the loader, the docker-compose mount, the run-script DIFF_FLAGS branch.
compatibility/tempo/driver/differ.go: remove the StartTimeUnixNano blank-skip — an asymmetric blank on either side is now a real divergence (the backend that omitted the field is the bug). Epsilon comparison for parsed numeric values is kept (float-noise absorption, not a case-allowlist).

CI / lefthook gates

.github/workflows/ci.yml forbid-skip:
- Replace the "guard new should_skip entries with a tracking ref" step with a hard reject of any non-empty should_skip block. The consumer code is gone; entries would be silently ignored.
- New step: reject test-suite escape-hatch primitives anywhere in .ts/.tsx/.go (EXPECTED_EMPTY, EXPECTED_TOLERATED, isKnownTolerated, tolerated404, expect.soft, should_tolerate, skipReason/SkipReason, APP_NOT_INSTALLED_BANNER_PATTERNS, DRILLDOWN_UPSTREAM_GRAFANA_CONSOLE_NOISE).
lefthook.yml: mirror the same forbid-escape-hatch + should_skip guards in the pre-push hook.
scripts/check-skip-additions.sh: deleted. The "guard untracked entries" policy is replaced by "zero entries".

Docs

docs/compatibility.md: replace the "Expected-failures allowlist" section with a "No allow-lists" section documenting the new policy.
compatibility/loki/README.md: align the overlay description with the schema-placeholder reality.
compatibility/prometheus/{test-cerberus.yml,scripts/run-compatibility.sh}: reword to remove the expected-failures references.

Inventory of compat-harness `should_skip` entries removed

The 17 entries in compatibility/loki/cerberus-test-queries.yml re-exposed by this PR fall into three buckets — every one is now a real bug to fix at the source. Follow-up tasks should be opened for each:

Upstream-pinned skip: true rows (10 entries) — fast/structured-metadata.yaml#JSON parsing with structured metadata filter first, regression/metric-queries.yaml#HTTP status code distribution, regression/structured-metadata.yaml#JSON parsing with duration filter and structured metadata, regression/structured-metadata.yaml#Drilldown pattern with case-insensitive error search, plus all six exhaustive/unwrap-aggregations.yaml#Standard variance/deviation/Quantile-over-time/p50/p90/p95/p99/latency/duration entries. The upstream YAML marks these skip: true because Loki's v2 engine doesn't support those shapes — meaning there is no reference baseline. Either the v2-engine compose-config baseline needs to flip, or these cases need to be requested upstream.
Structured-metadata harness seed-gap cluster (3 entries) — regression/structured-metadata.yaml#Combined line and metadata filter, regression/structured-metadata.yaml#Drilldown with failed search and error level, regression/structured-metadata.yaml#Drilldown with error/warn level and refused search, exhaustive/aggregations.yaml#Count aggregated by detected_level. Reference Loki indexes detected_level from structured-metadata block input; the harness seeder flows OTel records through the upstream ClickHouse exporter shape (SeverityText present, structured-metadata block absent), so the reference returns empty. The fix is in the harness seeder, not in cerberus.
Plain seed-gap (4 entries) — exhaustive/aggregations.yaml#Count aggregated by {pod,namespace,service_name,cluster and namespace,service_name and container} and exhaustive/unwrap-aggregations.yaml#Without multiple labels. The seeder doesn't write pod / namespace / service_name / cluster / container labels. Fix the seeder.

Inventory of PR #701 entries flagged for rebase

PR #701 (feat/quickstart-rich-observability) adds two more EXPECTED_EMPTY entries on top of main that are NOT in this PR's diff:

iterate-metrics-explorer.spec.ts — adds system_ prefix and clickhouse_event prefix entries (commit b3a9dad).
iterate-all-dashboards.spec.ts — broadens the clickhouse_event{name=~"Query match to clickhouse_event{name=~" (commit b4526fd).

Once this PR merges and #701 rebases, those entries become deletion conflicts the rebaser must hand-resolve to "deleted". The forbid-escape-hatch gate will reject #701 outright if any survives the rebase.

Test plan

CI forbid-skip green on this PR (the new gate doesn't catch any pattern in this PR's own diff).
CI check green — go vet already clean locally; go test ./compatibility/loki/cmd/loki-compliance-tester/ ./compatibility/tempo/driver/ passes locally (113 tests).
CI compatibility/{prometheus,loki,tempo} — every previously-tolerated diff now surfaces as a real failure. The compat harnesses are report-only (per task docs(compliance): refresh for v1.0.0-RC1 + per-QL corpora (M5.4) #68), so the CI lane stays green; the compat-score.json shows the new ground truth.
CI compose-smoke — expect.soft is now a hard throw; any previously-soft failure becomes a hard fail.
dashboard (push-to-main + nightly) — every previously-tolerated empty panel / 0-series metric / install-skip is now a hard fail.

🤖 Generated with Claude Code

…714) The 4 `detected_level` entries in cerberus-test-queries.yml carried a "harness seed-gap" rationale claiming the seeder doesn't push structured-metadata input for reference Loki to index. That rationale was stale: PR #412 wired structured-metadata into the seeder via `pushLoki`'s `logproto.LabelAdapter` block (compatibility/loki/cmd/seed/main.go), and loki-config.yaml has `allow_structured_metadata: true`. PR #712's compat run on this branch confirms parity: all four cases report `diff: ""` against the reference Loki: - regression/structured-metadata.yaml#Combined line and metadata filter - regression/structured-metadata.yaml#Drilldown with failed search and error level - regression/structured-metadata.yaml#Drilldown with error/warn level and refused search - exhaustive/aggregations.yaml#Count aggregated by detected_level Dropping these 4 entries promotes them out of the overlay back into the compat denominator with verified reference baselines. The remaining 20 entries (upstream `skip: true` rows + the `pod` / `namespace` / `service_name` / `cluster` / `container` seed-gap entries) stay as-is — those are separate from the structured-metadata cluster. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…l from cerberus-self The panel queries `{service_name="cerberus"} | SeverityText="ERROR"` which is empty on a healthy stack: cerberus only emits ERROR-level logs on infrastructure failures (CH query failed, internal wiring broken), not on user-error 4xx requests. The iterate-all-dashboards sweep hard-asserts every panel target is non-empty (no escape hatch in PR #712), so an "empty when healthy" panel is a dead UI that fails CI by definition. Remove the panel from all three dashboard surfaces (compose, legacy grafana, k3s ConfigMap) and the dashboards-provider description. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Structural cleanup PR. Every test failure must surface as a real bug to fix at the source (cerberus code, seed, dashboard, panel) — no allow- list, tolerance, expected-empty, should_skip, expect.soft, or per-field blank-skip is acceptable anywhere. Playwright specs - iterate-all-dashboards.spec.ts: delete EXPECTED_EMPTY_EXPR_SUBSTRINGS + isExpectedEmpty + the conditional in probeTarget. Every empty panel result is now a hard fail. - iterate-metrics-explorer.spec.ts: delete EXPECTED_EMPTY, HISTOGRAM_COMPANION_SUFFIXES, dottedAlias, and the companion- suffix + dotted-alias fallbacks that masked bare-name 0-result failures. Every catalog-published metric must resolve to >= 1 series. - iterate-drilldown-apps.spec.ts: delete ALERT_ERROR_PATTERNS (banner- substring allow-list), APP_NOT_INSTALLED_BANNER_PATTERNS, and DRILLDOWN_UPSTREAM_GRAFANA_CONSOLE_NOISE. Hard-fail every role=alert banner and every console error. Remove the install-probe early-skip; every catalogue entry must be installed (catalogue now lists three apps cerberus actually provisions; pyroscope is gone). - helpers/drilldown.ts: drop grafana-pyroscope-app from the catalogue. - compose_grafana_smoke.spec.ts: flip expect.soft -> hard throw; prune the retired-allowlist documentation block. - helpers/{assertions,dom,sweep}.ts: prune retired-allowlist comments; reword "tolerated" prose. Compatibility harnesses - compatibility/loki/cerberus-test-queries.yml: empty the should_skip block (was 17 entries). File kept as schema placeholder. - compatibility/loki/cmd/loki-compliance-tester/main.go: delete the Overlay loader, skipKey lookup, SkipReason field on Result, -overlay flag, and every overlay-driven branch in compareAll. Drop the yaml import. - compatibility/loki/scripts/run-loki-compatibility.sh: drop -overlay arg + DRIVER_OVERLAY env + the `skipped` jq bucket. - compatibility/prometheus/expected-failures.json: DELETED (was empty anyway; the file itself was the allowlist mechanism). - compatibility/tempo/expected-failures.json: DELETED. Remove the --expected-failures flag, the loader, the docker-compose mount, the run-script DIFF_FLAGS branch. - compatibility/tempo/driver/differ.go: remove the StartTimeUnixNano blank-skip — an asymmetric blank on either side is now a real divergence (the backend that omitted the field is the bug). Epsilon comparison for parsed numeric values is kept (float-noise absorption, not a case-allowlist). CI / lefthook gates - .github/workflows/ci.yml forbid-skip: * Replace the "guard new should_skip entries with a tracking ref" step with a hard reject of any non-empty should_skip block. The consumer code is gone; entries would be silently ignored. * New step: reject test-suite escape-hatch primitives anywhere in .ts/.tsx/.go (EXPECTED_EMPTY, EXPECTED_TOLERATED, isKnownTolerated, tolerated404, expect.soft, should_tolerate, skipReason/SkipReason, APP_NOT_INSTALLED_BANNER_PATTERNS, DRILLDOWN_UPSTREAM_GRAFANA_CONSOLE_NOISE). - lefthook.yml: mirror the same forbid-escape-hatch + should_skip guards in the pre-push hook. - scripts/check-skip-additions.sh: DELETED. The "guard untracked entries" policy is replaced by "zero entries". Docs - docs/compatibility.md: replace the "Expected-failures allowlist" section with a "No allow-lists" section documenting the new policy. - compatibility/loki/README.md: align the overlay description with the schema-placeholder reality. - compatibility/prometheus/{test-cerberus.yml,scripts/run-compatibility.sh}: reword to remove the expected-failures references. PR #701 follow-up (feat/quickstart-rich-observability) The PR #701 branch adds two more EXPECTED_EMPTY entries on top of main: - iterate-metrics-explorer.spec.ts: `system_` prefix + `clickhouse_event` prefix entries (b3a9dad). - iterate-all-dashboards.spec.ts: broaden the `clickhouse_event` match pattern (b4526fd). Once this PR merges and #701 rebases, those entries become deletion conflicts the rebaser must hand-resolve to "deleted". The `forbid-escape-hatch` gate will reject the PR if any survives.

…o panel populates The cerberus-self "Error rate by language" panel evaluates sum by (cerberus_ql) (rate(cerberus_queries_total{result="error"}[5m])) / clamp_min(sum by (cerberus_ql) (rate(cerberus_queries_total[5m])), 1e-9). On a fresh compose stack the numerator was always 0 because QueryMiddleware bucketed only 5xx as ResultError ("4xx = gateway behaved correctly") and the warmup harness only drove well-formed queries. The numerator collapsed empty, the panel returned empty, and the now-hard-asserting iterate-all-dashboards spec caught the empty result. Two changes, both load-bearing for the panel to render: * internal/telemetry/middleware.go: classify status >= 400 as ResultError. The cerberus_queries_total{result} series is a query-outcome metric, not an HTTP-SLO metric — a 400 parse rejection IS a failed query from the caller's point of view, and the panel is named "Error rate by language" precisely to track that ratio. The HTTP-layer SLO (where 4xx legitimately means gateway-healthy) rides the separate http.server.* instrument set, unchanged. * test/e2e/playwright/helpers/sweep.ts: have generateSelfTraffic also fire one deliberately-malformed query per QL head per iteration (unclosed PromQL aggregation, unterminated LogQL stream selector, truncated TraceQL braces). The error-path probes guarantee result="error" populates every by(cerberus_ql) bucket during the 30s warmup so the panel has data when the dashboard sweep loads it. Test update: TestQueryMiddleware_ResultError becomes a table-driven subtest covering 400 / 422 / 500 / 502 — all four bucket as error under the new classification.

…ns panel populates The cerberus-self "Admission rejections" panel groups by (cerberus_ql, reason) over cerberus_admit_rejected_total. Healthy compose warmup never approaches the per-head admit caps (prom 64 / loki 64 / tempo 32 — internal/api/admit) so the counter stays at zero and the panel returns an empty result, tripping the iterate-all- dashboards empty-result guard now that the allowlist mechanism is gone. Add an upfront admission-burst phase to generateSelfTraffic (test/e2e/playwright/helpers/sweep.ts) that fires ADMIT_BURST_CONCURRENCY=128 parallel requests per head against the cheapest endpoint per head (PromQL `up`, Loki `/labels`, TraceQL `{}`). 128 > every cap by ~2x even on tempo, so the per-head semaphores saturate and cerberus emits cerberus_admit_rejected_total with reason=cap_exceeded on all three (cerberus_ql, reason) buckets. Reason-bucket coverage: cap_exceeded is the only rejection reason the limiter currently emits (admit.ReasonCapExceeded; the limiter has exactly one rejection path — TryAcquire failing on a saturated semaphore). When a future rejection path lands in internal/api/admit the burst block should grow a sibling phase that exercises it.

…l from cerberus-self The panel queries `{service_name="cerberus"} | SeverityText="ERROR"` which is empty on a healthy stack: cerberus only emits ERROR-level logs on infrastructure failures (CH query failed, internal wiring broken), not on user-error 4xx requests. The iterate-all-dashboards sweep hard-asserts every panel target is non-empty (no escape hatch in PR #712), so an "empty when healthy" panel is a dead UI that fails CI by definition. Remove the panel from all three dashboard surfaces (compose, legacy grafana, k3s ConfigMap) and the dashboards-provider description. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…uld_skip rule

…-grammar __name__ on wire The catalog endpoint runs every stored MetricName through `OTelToPromMetric` so dotted-stored OTel names (`http.server.request.duration`) surface as the Prom-grammar form (`http_server_request_duration`). When the user (or Drilldown-Metrics' label chip) then queries `match[]={__name__= "<underscored>"}` the matcher lowering compared the underscored value against the raw dotted MetricName and returned zero rows — the "Unable to fetch labels" failure shape iterate-metrics-explorer pinned on PR #712 against `http_server_request_body_size`, `http_server_request_duration`, `http_server_response_body_size`. Two-part fix, mirroring the symmetric attribute-key fan-out PR #663 wired into `internal/promql.attributeLookup`: 1. New `expandUnderscoredMetricNameMatcher` helper in `internal/api/prom/metric_name_candidates.go` fans an unsuffixed `__name__=<underscored>` matcher out into every OTel-dotted candidate from `format.PromLabelToOTelCandidates`. Wired into `/api/v1/series`, `/api/v1/labels?match[]=`, and `/api/v1/label/<name>/values?match[]=` alongside the existing `expandBareHistogramMatcher`. Histogram-companion-suffixed names short-circuit so byte-stable tests under `handler_labels_histogram_test.go` keep their single-arm SQL. 2. `format.WithMetricName` normalises the projected `__name__` through `OTelToPromMetric` so the wire response matches the catalog's underscored alias. Without this the spec's `__name__`-mismatch sanity check would flag the dotted form even when the fan-out resolved the row.

tsouza enabled auto-merge (squash) May 22, 2026 21:30

This was referenced May 22, 2026

chore(compat/loki): drop 4 stale detected_level should_skip entries #714

Merged

test(compat/loki): pin k8s-family stream labels (pod/namespace/service_name/cluster/container) in seeder #715

Closed

tsouza force-pushed the chore/remove-all-allowlists branch 3 times, most recently from 3da2003 to cbc722c Compare May 22, 2026 22:17

tsouza force-pushed the chore/remove-all-allowlists branch from cbc722c to 87f0e1c Compare May 22, 2026 22:42

tsouza mentioned this pull request May 22, 2026

chore: combined audit follow-ups (lefthook mirror, stale-ID strip, fixture comments, gremlins policy) #726

Merged

3 tasks

tsouza and others added 5 commits May 22, 2026 22:53

fix(ci): drop forbid-skip self-test case 7 — superseded by strict sho…

21acba6

…uld_skip rule

tsouza force-pushed the chore/remove-all-allowlists branch from 87f0e1c to 21acba6 Compare May 22, 2026 22:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(tests): delete every test-suite escape-hatch / allowlist mechanism#712

chore(tests): delete every test-suite escape-hatch / allowlist mechanism#712
tsouza wants to merge 6 commits into
mainfrom
chore/remove-all-allowlists

tsouza commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tsouza commented May 22, 2026

Summary

Playwright specs

Compatibility harnesses

CI / lefthook gates

Docs

Inventory of compat-harness should_skip entries removed

Inventory of PR #701 entries flagged for rebase

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Inventory of compat-harness `should_skip` entries removed