Skip to content

chore(tests): delete every test-suite escape-hatch / allowlist mechanism#712

Open
tsouza wants to merge 6 commits into
mainfrom
chore/remove-all-allowlists
Open

chore(tests): delete every test-suite escape-hatch / allowlist mechanism#712
tsouza wants to merge 6 commits into
mainfrom
chore/remove-all-allowlists

Conversation

@tsouza
Copy link
Copy Markdown
Owner

@tsouza tsouza commented May 22, 2026

Summary

Structural cleanup. Every test failure must surface as a real bug to fix at the source (cerberus code, seed, dashboard, panel) — no allow-list, tolerance, expected-empty, should_skip, expect.soft, or per-field blank-skip is acceptable anywhere.

Playwright specs

  • iterate-all-dashboards.spec.ts: delete EXPECTED_EMPTY_EXPR_SUBSTRINGS (5 entries) + isExpectedEmpty + the conditional in probeTarget. Every empty panel result is now a hard fail.
  • iterate-metrics-explorer.spec.ts: delete EXPECTED_EMPTY (6 entries), HISTOGRAM_COMPANION_SUFFIXES, dottedAlias, and the companion-suffix + dotted-alias fallbacks that masked bare-name 0-result failures. Every catalog-published metric must resolve to >= 1 series.
  • iterate-drilldown-apps.spec.ts: delete ALERT_ERROR_PATTERNS (banner-substring allow-list), APP_NOT_INSTALLED_BANNER_PATTERNS, and DRILLDOWN_UPSTREAM_GRAFANA_CONSOLE_NOISE. Hard-fail every role=alert banner and every console error. Remove the install-probe early-skip; every catalogue entry must be installed.
  • helpers/drilldown.ts: drop grafana-pyroscope-app from the catalogue (cerberus doesn't ship profiling).
  • compose_grafana_smoke.spec.ts: flip expect.soft -> hard throw; prune the retired-allowlist documentation blocks.
  • helpers/{assertions,dom,sweep}.ts: prune retired-allowlist comments; reword "tolerated" prose.

Compatibility harnesses

  • compatibility/loki/cerberus-test-queries.yml: empty the should_skip block (was 17 entries — see commit body for the per-entry rationale capture). File kept as schema placeholder.
  • compatibility/loki/cmd/loki-compliance-tester/main.go: delete the Overlay loader, skipKey lookup, SkipReason field on Result, -overlay flag, and every overlay-driven branch in compareAll. Drop the yaml import.
  • compatibility/loki/scripts/run-loki-compatibility.sh: drop -overlay arg + DRIVER_OVERLAY env + the skipped jq bucket.
  • compatibility/prometheus/expected-failures.json: deleted (was empty; the file itself was the allowlist mechanism).
  • compatibility/tempo/expected-failures.json: deleted. Remove the --expected-failures flag, the loader, the docker-compose mount, the run-script DIFF_FLAGS branch.
  • compatibility/tempo/driver/differ.go: remove the StartTimeUnixNano blank-skip — an asymmetric blank on either side is now a real divergence (the backend that omitted the field is the bug). Epsilon comparison for parsed numeric values is kept (float-noise absorption, not a case-allowlist).

CI / lefthook gates

  • .github/workflows/ci.yml forbid-skip:
    • Replace the "guard new should_skip entries with a tracking ref" step with a hard reject of any non-empty should_skip block. The consumer code is gone; entries would be silently ignored.
    • New step: reject test-suite escape-hatch primitives anywhere in .ts/.tsx/.go (EXPECTED_EMPTY, EXPECTED_TOLERATED, isKnownTolerated, tolerated404, expect.soft, should_tolerate, skipReason/SkipReason, APP_NOT_INSTALLED_BANNER_PATTERNS, DRILLDOWN_UPSTREAM_GRAFANA_CONSOLE_NOISE).
  • lefthook.yml: mirror the same forbid-escape-hatch + should_skip guards in the pre-push hook.
  • scripts/check-skip-additions.sh: deleted. The "guard untracked entries" policy is replaced by "zero entries".

Docs

  • docs/compatibility.md: replace the "Expected-failures allowlist" section with a "No allow-lists" section documenting the new policy.
  • compatibility/loki/README.md: align the overlay description with the schema-placeholder reality.
  • compatibility/prometheus/{test-cerberus.yml,scripts/run-compatibility.sh}: reword to remove the expected-failures references.

Inventory of compat-harness should_skip entries removed

The 17 entries in compatibility/loki/cerberus-test-queries.yml re-exposed by this PR fall into three buckets — every one is now a real bug to fix at the source. Follow-up tasks should be opened for each:

  • Upstream-pinned skip: true rows (10 entries)fast/structured-metadata.yaml#JSON parsing with structured metadata filter first, regression/metric-queries.yaml#HTTP status code distribution, regression/structured-metadata.yaml#JSON parsing with duration filter and structured metadata, regression/structured-metadata.yaml#Drilldown pattern with case-insensitive error search, plus all six exhaustive/unwrap-aggregations.yaml#Standard variance/deviation/Quantile-over-time/p50/p90/p95/p99/latency/duration entries. The upstream YAML marks these skip: true because Loki's v2 engine doesn't support those shapes — meaning there is no reference baseline. Either the v2-engine compose-config baseline needs to flip, or these cases need to be requested upstream.
  • Structured-metadata harness seed-gap cluster (3 entries)regression/structured-metadata.yaml#Combined line and metadata filter, regression/structured-metadata.yaml#Drilldown with failed search and error level, regression/structured-metadata.yaml#Drilldown with error/warn level and refused search, exhaustive/aggregations.yaml#Count aggregated by detected_level. Reference Loki indexes detected_level from structured-metadata block input; the harness seeder flows OTel records through the upstream ClickHouse exporter shape (SeverityText present, structured-metadata block absent), so the reference returns empty. The fix is in the harness seeder, not in cerberus.
  • Plain seed-gap (4 entries)exhaustive/aggregations.yaml#Count aggregated by {pod,namespace,service_name,cluster and namespace,service_name and container} and exhaustive/unwrap-aggregations.yaml#Without multiple labels. The seeder doesn't write pod / namespace / service_name / cluster / container labels. Fix the seeder.

Inventory of PR #701 entries flagged for rebase

PR #701 (feat/quickstart-rich-observability) adds two more EXPECTED_EMPTY entries on top of main that are NOT in this PR's diff:

  • iterate-metrics-explorer.spec.ts — adds system_ prefix and clickhouse_event prefix entries (commit b3a9dad).
  • iterate-all-dashboards.spec.ts — broadens the clickhouse_event{name=~"Query match to clickhouse_event{name=~" (commit b4526fd).

Once this PR merges and #701 rebases, those entries become deletion conflicts the rebaser must hand-resolve to "deleted". The forbid-escape-hatch gate will reject #701 outright if any survives the rebase.

Test plan

  • CI forbid-skip green on this PR (the new gate doesn't catch any pattern in this PR's own diff).
  • CI check green — go vet already clean locally; go test ./compatibility/loki/cmd/loki-compliance-tester/ ./compatibility/tempo/driver/ passes locally (113 tests).
  • CI compatibility/{prometheus,loki,tempo} — every previously-tolerated diff now surfaces as a real failure. The compat harnesses are report-only (per task docs(compliance): refresh for v1.0.0-RC1 + per-QL corpora (M5.4) #68), so the CI lane stays green; the compat-score.json shows the new ground truth.
  • CI compose-smokeexpect.soft is now a hard throw; any previously-soft failure becomes a hard fail.
  • dashboard (push-to-main + nightly) — every previously-tolerated empty panel / 0-series metric / install-skip is now a hard fail.

🤖 Generated with Claude Code

@tsouza tsouza enabled auto-merge (squash) May 22, 2026 21:30
tsouza added a commit that referenced this pull request May 22, 2026
…714)

The 4 `detected_level` entries in cerberus-test-queries.yml carried a
"harness seed-gap" rationale claiming the seeder doesn't push
structured-metadata input for reference Loki to index. That rationale
was stale: PR #412 wired structured-metadata into the seeder via
`pushLoki`'s `logproto.LabelAdapter` block (compatibility/loki/cmd/seed/main.go),
and loki-config.yaml has `allow_structured_metadata: true`.

PR #712's compat run on this branch confirms parity: all four cases
report `diff: ""` against the reference Loki:

  - regression/structured-metadata.yaml#Combined line and metadata filter
  - regression/structured-metadata.yaml#Drilldown with failed search and error level
  - regression/structured-metadata.yaml#Drilldown with error/warn level and refused search
  - exhaustive/aggregations.yaml#Count aggregated by detected_level

Dropping these 4 entries promotes them out of the overlay back into
the compat denominator with verified reference baselines. The
remaining 20 entries (upstream `skip: true` rows + the `pod` /
`namespace` / `service_name` / `cluster` / `container` seed-gap
entries) stay as-is — those are separate from the structured-metadata
cluster.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@tsouza tsouza force-pushed the chore/remove-all-allowlists branch 3 times, most recently from 3da2003 to cbc722c Compare May 22, 2026 22:17
tsouza added a commit that referenced this pull request May 22, 2026
…l from cerberus-self

The panel queries `{service_name="cerberus"} | SeverityText="ERROR"`
which is empty on a healthy stack: cerberus only emits ERROR-level
logs on infrastructure failures (CH query failed, internal wiring
broken), not on user-error 4xx requests. The iterate-all-dashboards
sweep hard-asserts every panel target is non-empty (no escape hatch
in PR #712), so an "empty when healthy" panel is a dead UI that
fails CI by definition.

Remove the panel from all three dashboard surfaces (compose, legacy
grafana, k3s ConfigMap) and the dashboards-provider description.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@tsouza tsouza force-pushed the chore/remove-all-allowlists branch from cbc722c to 87f0e1c Compare May 22, 2026 22:42
tsouza and others added 5 commits May 22, 2026 22:53
Structural cleanup PR. Every test failure must surface as a real bug to
fix at the source (cerberus code, seed, dashboard, panel) — no allow-
list, tolerance, expected-empty, should_skip, expect.soft, or per-field
blank-skip is acceptable anywhere.

Playwright specs
- iterate-all-dashboards.spec.ts: delete EXPECTED_EMPTY_EXPR_SUBSTRINGS
  + isExpectedEmpty + the conditional in probeTarget. Every empty
  panel result is now a hard fail.
- iterate-metrics-explorer.spec.ts: delete EXPECTED_EMPTY,
  HISTOGRAM_COMPANION_SUFFIXES, dottedAlias, and the companion-
  suffix + dotted-alias fallbacks that masked bare-name 0-result
  failures. Every catalog-published metric must resolve to >= 1 series.
- iterate-drilldown-apps.spec.ts: delete ALERT_ERROR_PATTERNS (banner-
  substring allow-list), APP_NOT_INSTALLED_BANNER_PATTERNS, and
  DRILLDOWN_UPSTREAM_GRAFANA_CONSOLE_NOISE. Hard-fail every role=alert
  banner and every console error. Remove the install-probe early-skip;
  every catalogue entry must be installed (catalogue now lists three
  apps cerberus actually provisions; pyroscope is gone).
- helpers/drilldown.ts: drop grafana-pyroscope-app from the catalogue.
- compose_grafana_smoke.spec.ts: flip expect.soft -> hard throw; prune
  the retired-allowlist documentation block.
- helpers/{assertions,dom,sweep}.ts: prune retired-allowlist comments;
  reword "tolerated" prose.

Compatibility harnesses
- compatibility/loki/cerberus-test-queries.yml: empty the should_skip
  block (was 17 entries). File kept as schema placeholder.
- compatibility/loki/cmd/loki-compliance-tester/main.go: delete the
  Overlay loader, skipKey lookup, SkipReason field on Result, -overlay
  flag, and every overlay-driven branch in compareAll. Drop the yaml
  import.
- compatibility/loki/scripts/run-loki-compatibility.sh: drop -overlay
  arg + DRIVER_OVERLAY env + the `skipped` jq bucket.
- compatibility/prometheus/expected-failures.json: DELETED (was empty
  anyway; the file itself was the allowlist mechanism).
- compatibility/tempo/expected-failures.json: DELETED. Remove the
  --expected-failures flag, the loader, the docker-compose mount, the
  run-script DIFF_FLAGS branch.
- compatibility/tempo/driver/differ.go: remove the StartTimeUnixNano
  blank-skip — an asymmetric blank on either side is now a real
  divergence (the backend that omitted the field is the bug). Epsilon
  comparison for parsed numeric values is kept (float-noise absorption,
  not a case-allowlist).

CI / lefthook gates
- .github/workflows/ci.yml forbid-skip:
  * Replace the "guard new should_skip entries with a tracking ref"
    step with a hard reject of any non-empty should_skip block. The
    consumer code is gone; entries would be silently ignored.
  * New step: reject test-suite escape-hatch primitives anywhere in
    .ts/.tsx/.go (EXPECTED_EMPTY, EXPECTED_TOLERATED, isKnownTolerated,
    tolerated404, expect.soft, should_tolerate, skipReason/SkipReason,
    APP_NOT_INSTALLED_BANNER_PATTERNS, DRILLDOWN_UPSTREAM_GRAFANA_CONSOLE_NOISE).
- lefthook.yml: mirror the same forbid-escape-hatch + should_skip
  guards in the pre-push hook.
- scripts/check-skip-additions.sh: DELETED. The "guard untracked
  entries" policy is replaced by "zero entries".

Docs
- docs/compatibility.md: replace the "Expected-failures allowlist"
  section with a "No allow-lists" section documenting the new policy.
- compatibility/loki/README.md: align the overlay description with
  the schema-placeholder reality.
- compatibility/prometheus/{test-cerberus.yml,scripts/run-compatibility.sh}:
  reword to remove the expected-failures references.

PR #701 follow-up (feat/quickstart-rich-observability)
The PR #701 branch adds two more EXPECTED_EMPTY entries on top of main:
- iterate-metrics-explorer.spec.ts: `system_` prefix + `clickhouse_event`
  prefix entries (b3a9dad).
- iterate-all-dashboards.spec.ts: broaden the `clickhouse_event` match
  pattern (b4526fd).
Once this PR merges and #701 rebases, those entries become deletion
conflicts the rebaser must hand-resolve to "deleted". The
`forbid-escape-hatch` gate will reject the PR if any survives.
…o panel populates

The cerberus-self "Error rate by language" panel evaluates
sum by (cerberus_ql) (rate(cerberus_queries_total{result="error"}[5m]))
/ clamp_min(sum by (cerberus_ql) (rate(cerberus_queries_total[5m])),
1e-9). On a fresh compose stack the numerator was always 0 because
QueryMiddleware bucketed only 5xx as ResultError ("4xx = gateway
behaved correctly") and the warmup harness only drove well-formed
queries. The numerator collapsed empty, the panel returned empty, and
the now-hard-asserting iterate-all-dashboards spec caught the empty
result.

Two changes, both load-bearing for the panel to render:

* internal/telemetry/middleware.go: classify status >= 400 as
  ResultError. The cerberus_queries_total{result} series is a
  query-outcome metric, not an HTTP-SLO metric — a 400 parse
  rejection IS a failed query from the caller's point of view, and
  the panel is named "Error rate by language" precisely to track
  that ratio. The HTTP-layer SLO (where 4xx legitimately means
  gateway-healthy) rides the separate http.server.* instrument set,
  unchanged.
* test/e2e/playwright/helpers/sweep.ts: have generateSelfTraffic
  also fire one deliberately-malformed query per QL head per
  iteration (unclosed PromQL aggregation, unterminated LogQL stream
  selector, truncated TraceQL braces). The error-path probes
  guarantee result="error" populates every by(cerberus_ql) bucket
  during the 30s warmup so the panel has data when the dashboard
  sweep loads it.

Test update: TestQueryMiddleware_ResultError becomes a table-driven
subtest covering 400 / 422 / 500 / 502 — all four bucket as error
under the new classification.
…ns panel populates

The cerberus-self "Admission rejections" panel groups by
(cerberus_ql, reason) over cerberus_admit_rejected_total. Healthy
compose warmup never approaches the per-head admit caps (prom 64 /
loki 64 / tempo 32 — internal/api/admit) so the counter stays at zero
and the panel returns an empty result, tripping the iterate-all-
dashboards empty-result guard now that the allowlist mechanism is
gone.

Add an upfront admission-burst phase to generateSelfTraffic
(test/e2e/playwright/helpers/sweep.ts) that fires
ADMIT_BURST_CONCURRENCY=128 parallel requests per head against the
cheapest endpoint per head (PromQL `up`, Loki `/labels`, TraceQL
`{}`). 128 > every cap by ~2x even on tempo, so the per-head
semaphores saturate and cerberus emits cerberus_admit_rejected_total
with reason=cap_exceeded on all three (cerberus_ql, reason) buckets.

Reason-bucket coverage: cap_exceeded is the only rejection reason the
limiter currently emits (admit.ReasonCapExceeded; the limiter has
exactly one rejection path — TryAcquire failing on a saturated
semaphore). When a future rejection path lands in internal/api/admit
the burst block should grow a sibling phase that exercises it.
…l from cerberus-self

The panel queries `{service_name="cerberus"} | SeverityText="ERROR"`
which is empty on a healthy stack: cerberus only emits ERROR-level
logs on infrastructure failures (CH query failed, internal wiring
broken), not on user-error 4xx requests. The iterate-all-dashboards
sweep hard-asserts every panel target is non-empty (no escape hatch
in PR #712), so an "empty when healthy" panel is a dead UI that
fails CI by definition.

Remove the panel from all three dashboard surfaces (compose, legacy
grafana, k3s ConfigMap) and the dashboards-provider description.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@tsouza tsouza force-pushed the chore/remove-all-allowlists branch from 87f0e1c to 21acba6 Compare May 22, 2026 22:56
…-grammar __name__ on wire

The catalog endpoint runs every stored MetricName through `OTelToPromMetric`
so dotted-stored OTel names (`http.server.request.duration`) surface as the
Prom-grammar form (`http_server_request_duration`). When the user (or
Drilldown-Metrics' label chip) then queries `match[]={__name__=
"<underscored>"}` the matcher lowering compared the underscored value
against the raw dotted MetricName and returned zero rows — the
"Unable to fetch labels" failure shape iterate-metrics-explorer pinned
on PR #712 against `http_server_request_body_size`, `http_server_request_duration`,
`http_server_response_body_size`.

Two-part fix, mirroring the symmetric attribute-key fan-out PR #663
wired into `internal/promql.attributeLookup`:

  1. New `expandUnderscoredMetricNameMatcher` helper in
     `internal/api/prom/metric_name_candidates.go` fans an
     unsuffixed `__name__=<underscored>` matcher out into every
     OTel-dotted candidate from `format.PromLabelToOTelCandidates`.
     Wired into `/api/v1/series`, `/api/v1/labels?match[]=`, and
     `/api/v1/label/<name>/values?match[]=` alongside the existing
     `expandBareHistogramMatcher`. Histogram-companion-suffixed
     names short-circuit so byte-stable tests under
     `handler_labels_histogram_test.go` keep their single-arm SQL.

  2. `format.WithMetricName` normalises the projected `__name__`
     through `OTelToPromMetric` so the wire response matches the
     catalog's underscored alias. Without this the spec's
     `__name__`-mismatch sanity check would flag the dotted form
     even when the fan-out resolved the row.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant