chore(tests): delete every test-suite escape-hatch / allowlist mechanism#712
Open
tsouza wants to merge 6 commits into
Open
chore(tests): delete every test-suite escape-hatch / allowlist mechanism#712tsouza wants to merge 6 commits into
tsouza wants to merge 6 commits into
Conversation
This was referenced May 22, 2026
tsouza
added a commit
that referenced
this pull request
May 22, 2026
…714) The 4 `detected_level` entries in cerberus-test-queries.yml carried a "harness seed-gap" rationale claiming the seeder doesn't push structured-metadata input for reference Loki to index. That rationale was stale: PR #412 wired structured-metadata into the seeder via `pushLoki`'s `logproto.LabelAdapter` block (compatibility/loki/cmd/seed/main.go), and loki-config.yaml has `allow_structured_metadata: true`. PR #712's compat run on this branch confirms parity: all four cases report `diff: ""` against the reference Loki: - regression/structured-metadata.yaml#Combined line and metadata filter - regression/structured-metadata.yaml#Drilldown with failed search and error level - regression/structured-metadata.yaml#Drilldown with error/warn level and refused search - exhaustive/aggregations.yaml#Count aggregated by detected_level Dropping these 4 entries promotes them out of the overlay back into the compat denominator with verified reference baselines. The remaining 20 entries (upstream `skip: true` rows + the `pod` / `namespace` / `service_name` / `cluster` / `container` seed-gap entries) stay as-is — those are separate from the structured-metadata cluster. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
3da2003 to
cbc722c
Compare
tsouza
added a commit
that referenced
this pull request
May 22, 2026
…l from cerberus-self
The panel queries `{service_name="cerberus"} | SeverityText="ERROR"`
which is empty on a healthy stack: cerberus only emits ERROR-level
logs on infrastructure failures (CH query failed, internal wiring
broken), not on user-error 4xx requests. The iterate-all-dashboards
sweep hard-asserts every panel target is non-empty (no escape hatch
in PR #712), so an "empty when healthy" panel is a dead UI that
fails CI by definition.
Remove the panel from all three dashboard surfaces (compose, legacy
grafana, k3s ConfigMap) and the dashboards-provider description.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
cbc722c to
87f0e1c
Compare
3 tasks
Structural cleanup PR. Every test failure must surface as a real bug to
fix at the source (cerberus code, seed, dashboard, panel) — no allow-
list, tolerance, expected-empty, should_skip, expect.soft, or per-field
blank-skip is acceptable anywhere.
Playwright specs
- iterate-all-dashboards.spec.ts: delete EXPECTED_EMPTY_EXPR_SUBSTRINGS
+ isExpectedEmpty + the conditional in probeTarget. Every empty
panel result is now a hard fail.
- iterate-metrics-explorer.spec.ts: delete EXPECTED_EMPTY,
HISTOGRAM_COMPANION_SUFFIXES, dottedAlias, and the companion-
suffix + dotted-alias fallbacks that masked bare-name 0-result
failures. Every catalog-published metric must resolve to >= 1 series.
- iterate-drilldown-apps.spec.ts: delete ALERT_ERROR_PATTERNS (banner-
substring allow-list), APP_NOT_INSTALLED_BANNER_PATTERNS, and
DRILLDOWN_UPSTREAM_GRAFANA_CONSOLE_NOISE. Hard-fail every role=alert
banner and every console error. Remove the install-probe early-skip;
every catalogue entry must be installed (catalogue now lists three
apps cerberus actually provisions; pyroscope is gone).
- helpers/drilldown.ts: drop grafana-pyroscope-app from the catalogue.
- compose_grafana_smoke.spec.ts: flip expect.soft -> hard throw; prune
the retired-allowlist documentation block.
- helpers/{assertions,dom,sweep}.ts: prune retired-allowlist comments;
reword "tolerated" prose.
Compatibility harnesses
- compatibility/loki/cerberus-test-queries.yml: empty the should_skip
block (was 17 entries). File kept as schema placeholder.
- compatibility/loki/cmd/loki-compliance-tester/main.go: delete the
Overlay loader, skipKey lookup, SkipReason field on Result, -overlay
flag, and every overlay-driven branch in compareAll. Drop the yaml
import.
- compatibility/loki/scripts/run-loki-compatibility.sh: drop -overlay
arg + DRIVER_OVERLAY env + the `skipped` jq bucket.
- compatibility/prometheus/expected-failures.json: DELETED (was empty
anyway; the file itself was the allowlist mechanism).
- compatibility/tempo/expected-failures.json: DELETED. Remove the
--expected-failures flag, the loader, the docker-compose mount, the
run-script DIFF_FLAGS branch.
- compatibility/tempo/driver/differ.go: remove the StartTimeUnixNano
blank-skip — an asymmetric blank on either side is now a real
divergence (the backend that omitted the field is the bug). Epsilon
comparison for parsed numeric values is kept (float-noise absorption,
not a case-allowlist).
CI / lefthook gates
- .github/workflows/ci.yml forbid-skip:
* Replace the "guard new should_skip entries with a tracking ref"
step with a hard reject of any non-empty should_skip block. The
consumer code is gone; entries would be silently ignored.
* New step: reject test-suite escape-hatch primitives anywhere in
.ts/.tsx/.go (EXPECTED_EMPTY, EXPECTED_TOLERATED, isKnownTolerated,
tolerated404, expect.soft, should_tolerate, skipReason/SkipReason,
APP_NOT_INSTALLED_BANNER_PATTERNS, DRILLDOWN_UPSTREAM_GRAFANA_CONSOLE_NOISE).
- lefthook.yml: mirror the same forbid-escape-hatch + should_skip
guards in the pre-push hook.
- scripts/check-skip-additions.sh: DELETED. The "guard untracked
entries" policy is replaced by "zero entries".
Docs
- docs/compatibility.md: replace the "Expected-failures allowlist"
section with a "No allow-lists" section documenting the new policy.
- compatibility/loki/README.md: align the overlay description with
the schema-placeholder reality.
- compatibility/prometheus/{test-cerberus.yml,scripts/run-compatibility.sh}:
reword to remove the expected-failures references.
PR #701 follow-up (feat/quickstart-rich-observability)
The PR #701 branch adds two more EXPECTED_EMPTY entries on top of main:
- iterate-metrics-explorer.spec.ts: `system_` prefix + `clickhouse_event`
prefix entries (b3a9dad).
- iterate-all-dashboards.spec.ts: broaden the `clickhouse_event` match
pattern (b4526fd).
Once this PR merges and #701 rebases, those entries become deletion
conflicts the rebaser must hand-resolve to "deleted". The
`forbid-escape-hatch` gate will reject the PR if any survives.
…o panel populates
The cerberus-self "Error rate by language" panel evaluates
sum by (cerberus_ql) (rate(cerberus_queries_total{result="error"}[5m]))
/ clamp_min(sum by (cerberus_ql) (rate(cerberus_queries_total[5m])),
1e-9). On a fresh compose stack the numerator was always 0 because
QueryMiddleware bucketed only 5xx as ResultError ("4xx = gateway
behaved correctly") and the warmup harness only drove well-formed
queries. The numerator collapsed empty, the panel returned empty, and
the now-hard-asserting iterate-all-dashboards spec caught the empty
result.
Two changes, both load-bearing for the panel to render:
* internal/telemetry/middleware.go: classify status >= 400 as
ResultError. The cerberus_queries_total{result} series is a
query-outcome metric, not an HTTP-SLO metric — a 400 parse
rejection IS a failed query from the caller's point of view, and
the panel is named "Error rate by language" precisely to track
that ratio. The HTTP-layer SLO (where 4xx legitimately means
gateway-healthy) rides the separate http.server.* instrument set,
unchanged.
* test/e2e/playwright/helpers/sweep.ts: have generateSelfTraffic
also fire one deliberately-malformed query per QL head per
iteration (unclosed PromQL aggregation, unterminated LogQL stream
selector, truncated TraceQL braces). The error-path probes
guarantee result="error" populates every by(cerberus_ql) bucket
during the 30s warmup so the panel has data when the dashboard
sweep loads it.
Test update: TestQueryMiddleware_ResultError becomes a table-driven
subtest covering 400 / 422 / 500 / 502 — all four bucket as error
under the new classification.
…ns panel populates
The cerberus-self "Admission rejections" panel groups by
(cerberus_ql, reason) over cerberus_admit_rejected_total. Healthy
compose warmup never approaches the per-head admit caps (prom 64 /
loki 64 / tempo 32 — internal/api/admit) so the counter stays at zero
and the panel returns an empty result, tripping the iterate-all-
dashboards empty-result guard now that the allowlist mechanism is
gone.
Add an upfront admission-burst phase to generateSelfTraffic
(test/e2e/playwright/helpers/sweep.ts) that fires
ADMIT_BURST_CONCURRENCY=128 parallel requests per head against the
cheapest endpoint per head (PromQL `up`, Loki `/labels`, TraceQL
`{}`). 128 > every cap by ~2x even on tempo, so the per-head
semaphores saturate and cerberus emits cerberus_admit_rejected_total
with reason=cap_exceeded on all three (cerberus_ql, reason) buckets.
Reason-bucket coverage: cap_exceeded is the only rejection reason the
limiter currently emits (admit.ReasonCapExceeded; the limiter has
exactly one rejection path — TryAcquire failing on a saturated
semaphore). When a future rejection path lands in internal/api/admit
the burst block should grow a sibling phase that exercises it.
…l from cerberus-self
The panel queries `{service_name="cerberus"} | SeverityText="ERROR"`
which is empty on a healthy stack: cerberus only emits ERROR-level
logs on infrastructure failures (CH query failed, internal wiring
broken), not on user-error 4xx requests. The iterate-all-dashboards
sweep hard-asserts every panel target is non-empty (no escape hatch
in PR #712), so an "empty when healthy" panel is a dead UI that
fails CI by definition.
Remove the panel from all three dashboard surfaces (compose, legacy
grafana, k3s ConfigMap) and the dashboards-provider description.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
87f0e1c to
21acba6
Compare
…-grammar __name__ on wire
The catalog endpoint runs every stored MetricName through `OTelToPromMetric`
so dotted-stored OTel names (`http.server.request.duration`) surface as the
Prom-grammar form (`http_server_request_duration`). When the user (or
Drilldown-Metrics' label chip) then queries `match[]={__name__=
"<underscored>"}` the matcher lowering compared the underscored value
against the raw dotted MetricName and returned zero rows — the
"Unable to fetch labels" failure shape iterate-metrics-explorer pinned
on PR #712 against `http_server_request_body_size`, `http_server_request_duration`,
`http_server_response_body_size`.
Two-part fix, mirroring the symmetric attribute-key fan-out PR #663
wired into `internal/promql.attributeLookup`:
1. New `expandUnderscoredMetricNameMatcher` helper in
`internal/api/prom/metric_name_candidates.go` fans an
unsuffixed `__name__=<underscored>` matcher out into every
OTel-dotted candidate from `format.PromLabelToOTelCandidates`.
Wired into `/api/v1/series`, `/api/v1/labels?match[]=`, and
`/api/v1/label/<name>/values?match[]=` alongside the existing
`expandBareHistogramMatcher`. Histogram-companion-suffixed
names short-circuit so byte-stable tests under
`handler_labels_histogram_test.go` keep their single-arm SQL.
2. `format.WithMetricName` normalises the projected `__name__`
through `OTelToPromMetric` so the wire response matches the
catalog's underscored alias. Without this the spec's
`__name__`-mismatch sanity check would flag the dotted form
even when the fan-out resolved the row.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Structural cleanup. Every test failure must surface as a real bug to fix at the source (cerberus code, seed, dashboard, panel) — no allow-list, tolerance, expected-empty, should_skip, expect.soft, or per-field blank-skip is acceptable anywhere.
Playwright specs
iterate-all-dashboards.spec.ts: deleteEXPECTED_EMPTY_EXPR_SUBSTRINGS(5 entries) +isExpectedEmpty+ the conditional inprobeTarget. Every empty panel result is now a hard fail.iterate-metrics-explorer.spec.ts: deleteEXPECTED_EMPTY(6 entries),HISTOGRAM_COMPANION_SUFFIXES,dottedAlias, and the companion-suffix + dotted-alias fallbacks that masked bare-name 0-result failures. Every catalog-published metric must resolve to >= 1 series.iterate-drilldown-apps.spec.ts: deleteALERT_ERROR_PATTERNS(banner-substring allow-list),APP_NOT_INSTALLED_BANNER_PATTERNS, andDRILLDOWN_UPSTREAM_GRAFANA_CONSOLE_NOISE. Hard-fail everyrole=alertbanner and every console error. Remove the install-probe early-skip; every catalogue entry must be installed.helpers/drilldown.ts: dropgrafana-pyroscope-appfrom the catalogue (cerberus doesn't ship profiling).compose_grafana_smoke.spec.ts: flipexpect.soft-> hard throw; prune the retired-allowlist documentation blocks.helpers/{assertions,dom,sweep}.ts: prune retired-allowlist comments; reword "tolerated" prose.Compatibility harnesses
compatibility/loki/cerberus-test-queries.yml: empty theshould_skipblock (was 17 entries — see commit body for the per-entry rationale capture). File kept as schema placeholder.compatibility/loki/cmd/loki-compliance-tester/main.go: delete theOverlayloader,skipKeylookup,SkipReasonfield onResult,-overlayflag, and every overlay-driven branch incompareAll. Drop theyamlimport.compatibility/loki/scripts/run-loki-compatibility.sh: drop-overlayarg +DRIVER_OVERLAYenv + theskippedjq bucket.compatibility/prometheus/expected-failures.json: deleted (was empty; the file itself was the allowlist mechanism).compatibility/tempo/expected-failures.json: deleted. Remove the--expected-failuresflag, the loader, the docker-compose mount, the run-scriptDIFF_FLAGSbranch.compatibility/tempo/driver/differ.go: remove theStartTimeUnixNanoblank-skip — an asymmetric blank on either side is now a real divergence (the backend that omitted the field is the bug). Epsilon comparison for parsed numeric values is kept (float-noise absorption, not a case-allowlist).CI / lefthook gates
.github/workflows/ci.ymlforbid-skip:should_skipblock. The consumer code is gone; entries would be silently ignored..ts/.tsx/.go(EXPECTED_EMPTY,EXPECTED_TOLERATED,isKnownTolerated,tolerated404,expect.soft,should_tolerate,skipReason/SkipReason,APP_NOT_INSTALLED_BANNER_PATTERNS,DRILLDOWN_UPSTREAM_GRAFANA_CONSOLE_NOISE).lefthook.yml: mirror the sameforbid-escape-hatch+should_skipguards in the pre-push hook.scripts/check-skip-additions.sh: deleted. The "guard untracked entries" policy is replaced by "zero entries".Docs
docs/compatibility.md: replace the "Expected-failures allowlist" section with a "No allow-lists" section documenting the new policy.compatibility/loki/README.md: align the overlay description with the schema-placeholder reality.compatibility/prometheus/{test-cerberus.yml,scripts/run-compatibility.sh}: reword to remove the expected-failures references.Inventory of compat-harness
should_skipentries removedThe 17 entries in
compatibility/loki/cerberus-test-queries.ymlre-exposed by this PR fall into three buckets — every one is now a real bug to fix at the source. Follow-up tasks should be opened for each:skip: truerows (10 entries) —fast/structured-metadata.yaml#JSON parsing with structured metadata filter first,regression/metric-queries.yaml#HTTP status code distribution,regression/structured-metadata.yaml#JSON parsing with duration filter and structured metadata,regression/structured-metadata.yaml#Drilldown pattern with case-insensitive error search, plus all sixexhaustive/unwrap-aggregations.yaml#Standard variance/deviation/Quantile-over-time/p50/p90/p95/p99/latency/durationentries. The upstream YAML marks theseskip: truebecause Loki's v2 engine doesn't support those shapes — meaning there is no reference baseline. Either the v2-engine compose-config baseline needs to flip, or these cases need to be requested upstream.regression/structured-metadata.yaml#Combined line and metadata filter,regression/structured-metadata.yaml#Drilldown with failed search and error level,regression/structured-metadata.yaml#Drilldown with error/warn level and refused search,exhaustive/aggregations.yaml#Count aggregated by detected_level. Reference Loki indexesdetected_levelfrom structured-metadata block input; the harness seeder flows OTel records through the upstream ClickHouse exporter shape (SeverityTextpresent, structured-metadata block absent), so the reference returns empty. The fix is in the harness seeder, not in cerberus.exhaustive/aggregations.yaml#Count aggregated by {pod,namespace,service_name,cluster and namespace,service_name and container}andexhaustive/unwrap-aggregations.yaml#Without multiple labels. The seeder doesn't writepod/namespace/service_name/cluster/containerlabels. Fix the seeder.Inventory of PR #701 entries flagged for rebase
PR #701 (
feat/quickstart-rich-observability) adds two moreEXPECTED_EMPTYentries on top of main that are NOT in this PR's diff:iterate-metrics-explorer.spec.ts— addssystem_prefix andclickhouse_eventprefix entries (commit b3a9dad).iterate-all-dashboards.spec.ts— broadens theclickhouse_event{name=~"Querymatch toclickhouse_event{name=~"(commit b4526fd).Once this PR merges and #701 rebases, those entries become deletion conflicts the rebaser must hand-resolve to "deleted". The
forbid-escape-hatchgate will reject #701 outright if any survives the rebase.Test plan
forbid-skipgreen on this PR (the new gate doesn't catch any pattern in this PR's own diff).checkgreen —go vetalready clean locally;go test ./compatibility/loki/cmd/loki-compliance-tester/ ./compatibility/tempo/driver/passes locally (113 tests).compatibility/{prometheus,loki,tempo}— every previously-tolerated diff now surfaces as a real failure. The compat harnesses are report-only (per task docs(compliance): refresh for v1.0.0-RC1 + per-QL corpora (M5.4) #68), so the CI lane stays green; thecompat-score.jsonshows the new ground truth.compose-smoke—expect.softis now a hard throw; any previously-soft failure becomes a hard fail.dashboard(push-to-main + nightly) — every previously-tolerated empty panel / 0-series metric / install-skip is now a hard fail.🤖 Generated with Claude Code