docs(conformance): publish reproducible evidence + policy conformance count (#120) by sergeyenin · Pull Request #163 · dativo-io/talon

sergeyenin · 2026-06-03T09:55:50Z

Summary

Talon's honest answer to "how many conformance tests do you have?" — a single, reproducible number for the two paths that carry its core guarantees: the evidence path (build/sign/export/verify) and the policy path (classify/route/allow-deny).

make conformance runs go test -count=1 -run . -v over ./internal/policy/... and ./internal/evidence/..., counts passing tests/subtests, and prints Conformance: N passing .... Currently 317. Fails (non-zero) if any test fails.
CI runs the target on every PR and writes the count to the job step summary.
README surfaces the number under Evidence & compliance and links the new doc.
docs/reference/conformance.md defines what counts as a conformance test, the suite composition, the counting method, how to reproduce from a clean checkout, and what the number does/does not mean (no compliance-outcome overclaim).

The published number is a floor that grows automatically as tests are added — the suite is simply "every test in those two packages", so no list needs hand-maintaining. The authoritative value is whatever make conformance prints for a given commit.

Adjacent suites (OPA/Rego via make opa-test, integration, e2e) are intentionally tracked separately and called out in the doc.

Test plan

make conformance → Conformance: 317 passing tests across evidence + policy paths
Target exits non-zero on test failure (fails fast, prints tail)
scripts/check-claim-discipline.sh passes (doc avoids banned outcome phrasing)
Relative links resolve (evidence-integrity-spec.md, ../../LIMITATIONS.md)

Closes #120

Note

Low Risk
Documentation and CI/Makefile tooling only; no runtime, auth, or production code paths change.

Overview
Adds a reproducible conformance count for Talon’s core evidence and policy Go test packages, so the published number is whatever make conformance prints for a given commit—not a hand-maintained figure.

make conformance runs go test -count=1 -run . -v over ./internal/policy/... and ./internal/evidence/..., counts --- PASS: lines (including table subtests), prints a one-line total (currently 317 in docs), and fails CI if any test fails. CI adds a step that runs this target and appends output to the GitHub job step summary.

docs/reference/conformance.md defines scope, counting method, in-scope test files, adjacent suites (make opa-test, integration/e2e) excluded from the count, and explicit limits on what the number does not claim. README and docs index link the doc and surface the count under Evidence & compliance.

^{Reviewed by Cursor Bugbot for commit dd9b197. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: CI pipe masks conformance failure
- Added set -o pipefail before the pipeline so the step exits with make conformance's exit code instead of tee's.

Preview (0550a77729)

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -63,6 +63,11 @@
             exit 1
           fi
 
+      - name: Conformance count (evidence + policy paths)
+        run: |
+          set -o pipefail
+          make conformance | tee -a "$GITHUB_STEP_SUMMARY"
+
       - name: OPA/Rego policy tests
         run: |
           if ! command -v opa &> /dev/null; then

diff --git a/Makefile b/Makefile
--- a/Makefile
+++ b/Makefile
@@ -16,8 +16,13 @@
   GO_ENV := env -u CC CC=/usr/bin/clang CGO_ENABLED=1
 endif
 
-.PHONY: help build install test test-integration test-e2e test-smoke test-all test-ssot-gate lint fmt clean vet mod-tidy check docker-build demo-gateway demo-full demo-clean verify-flow0 nosec-count
+.PHONY: help build install test test-integration test-e2e test-smoke test-all test-ssot-gate conformance lint fmt clean vet mod-tidy check docker-build demo-gateway demo-full demo-clean verify-flow0 nosec-count
 
+# Conformance suite: the evidence + policy paths whose passing test/subtest
+# count is published as Talon's honest conformance number. See
+# docs/reference/conformance.md.
+CONFORMANCE_PKGS := ./internal/policy/... ./internal/evidence/...
+
 help: ## Show this help
 	@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | sort | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-20s\033[0m %s\n", $$1, $$2}'
 
@@ -52,6 +57,12 @@
 test-ssot-gate: ## Run consolidated SSOT parity/resilience gate tests
 	@$(GO_ENV) go test -count=1 ./internal/server -run SSOTGate
 
+conformance: ## Run the evidence + policy conformance suite and print the passing count
+	@out=$$($(GO_ENV) go test -count=1 -run . -v $(CONFORMANCE_PKGS) 2>&1); rc=$$?; \
+	count=$$(printf '%s\n' "$$out" | grep -c -- '--- PASS:'); \
+	if [ $$rc -ne 0 ]; then printf '%s\n' "$$out" | tail -20; echo "conformance: FAILED ($$count passing before failure)"; exit 1; fi; \
+	echo "Conformance: $$count passing tests across evidence + policy paths ($(CONFORMANCE_PKGS))"
+
 lint: ## Run linter
 	@golangci-lint run ./...
 

diff --git a/README.md b/README.md
--- a/README.md
+++ b/README.md
@@ -138,6 +138,7 @@
 - HMAC-SHA256 signed evidence record per request; verify with `talon audit verify`.
 - Export to CSV, JSON, or signed JSON/NDJSON for auditors and offline verification.
 - Supporting controls mapped to GDPR Article 30, NIS2, DORA, and EU AI Act traceability.
+- Conformance: **317 passing tests** across the evidence + policy paths — reproduce with `make conformance`. See [Conformance suite & count](docs/reference/conformance.md).
 
 See [Evidence store](docs/explanation/evidence-store.md).
 
@@ -297,6 +298,7 @@
 - [Policy cookbook](docs/guides/policy-cookbook.md)
 - [Provider registry](docs/reference/provider-registry.md)
 - [Evidence store](docs/explanation/evidence-store.md)
+- [Conformance suite & count](docs/reference/conformance.md)
 - [Gateway dashboard](docs/reference/gateway-dashboard.md)
 - [OpenClaw integration](docs/guides/openclaw-integration.md)
 - [Slack bot integration](docs/guides/slack-bot-integration.md)

diff --git a/docs/README.md b/docs/README.md
--- a/docs/README.md
+++ b/docs/README.md
@@ -68,6 +68,7 @@
 |-----|-------------|
 | [Configuration and environment](reference/configuration.md) | Environment variables, crypto keys, and config reference. |
 | [Evidence integrity specification](reference/evidence-integrity-spec.md) | Normative signed-record spec: fields, canonical serialization, HMAC-SHA256 signing, and the independent verification procedure. |
+| [Conformance suite & count](reference/conformance.md) | What counts as a conformance test for the evidence + policy paths, and how to reproduce the published count with `make conformance`. |
 | [Authentication and key scopes](reference/authentication-and-key-scopes.md) | Which keys authenticate which endpoint families (gateway vs control plane vs dashboard). |
 | [Gateway dashboard](reference/gateway-dashboard.md) | Dashboard endpoints, metrics API schema, snapshot fields, and authentication. |
 | [Operational control plane](reference/operational-control-plane.md) | Run management (list/kill/pause/resume), tenant lockdown, runtime overrides, tool approval gates. |
@@ -95,6 +96,7 @@
 | [Why not just a PII proxy?](explanation/why-not-a-pii-proxy.md) | Control-plane vs scrubber differentiation with proof commands. |
 | [Evidence store](explanation/evidence-store.md) | HMAC integrity model and verification flow. |
 | [Evidence integrity specification](reference/evidence-integrity-spec.md) | Byte-exact spec so a third party can independently verify a record. |
+| [Conformance suite & count](reference/conformance.md) | Reproducible passing-test count for the evidence + policy paths (`make conformance`). |
 | [Evidence integrity 5-minute proof](tutorials/evidence-integrity-demo.md) | Fast proof moment for auditors/operators, including offline signed-export verification. |
 | [Security policy](../SECURITY.md) | Vulnerability reporting process and security scope. |
 | [Docker Compose demo](../examples/docker-compose/README.md) | Fastest no-key proof loop. |

diff --git a/docs/reference/conformance.md b/docs/reference/conformance.md
new file mode 100644
--- /dev/null
+++ b/docs/reference/conformance.md
@@ -1,0 +1,83 @@
+# Conformance Suite & Published Count
+
+**Status:** stable · **Scope:** the evidence and policy execution paths.
+
+Talon publishes a single, reproducible number: the count of passing tests across
+the two paths that carry its core guarantees — the **evidence** path (how records
+are built, signed, exported, and verified) and the **policy** path (how requests
+are classified, routed, and allowed or denied).
+
+The number is meant to be *checkable*, not impressive. Anyone can reproduce it
+from a clean checkout, and CI prints it on every run. The authoritative value is
+whatever `make conformance` reports for the commit you are looking at.
+
+## Reproduce it
+
+```bash
+make conformance
+```
+
+Example output:
+
+```
+Conformance: 317 passing tests across evidence + policy paths (./internal/policy/... ./internal/evidence/...)
+```
+
+The target runs `go test -count=1 -run . -v` over `./internal/policy/...` and
+`./internal/evidence/...`, then counts the `--- PASS:` lines emitted by the Go test
+runner. That count includes both top-level test functions and table-driven
+subtests, so each named case is counted once. `-count=1` disables the test cache,
+so the number is computed fresh every time. If any test fails, the target exits
+non-zero and prints the failure tail instead of a count.
+
+## What is in scope
+
+The count aggregates the test files in the two packages below. The list is
+descriptive — the suite is simply "every test in these two packages", so new tests
+raise the number automatically without touching this document.
+
+**Policy path — `internal/policy`**
+
+| File | Covers |
+|------|--------|
+| `engine_test.go` | Policy engine evaluate/decision logic |
+| `gateway_engine_test.go` | Gateway-mode policy evaluation |
+| `golden_test.go` | Golden policy decisions against `testdata/` fixtures |
+| `loader_test.go` | `.talon.yaml` policy loading and validation |
+| `routing_policy_test.go` | Tier-based model routing decisions |
+| `classifier_convert_test.go` | Classifier → policy-input conversion |
+| `proxy_test.go` | Proxy-mode policy enforcement |
+| `openclaw_gaps_test.go` | Regression cases for known governance gaps |
+| `metrics_test.go` | Policy decision metrics |
+
+**Evidence path — `internal/evidence`**
+
+| File | Covers |
+|------|--------|
+| `store_test.go` | Evidence record build, persist, query, tenant scoping |
+| `signed_export_test.go` | Signed JSON/NDJSON export and offline verification |
+| `integrity_spec_test.go` | Round-trip of the [evidence integrity spec](evidence-integrity-spec.md) |
+| `schema_compat_test.go` | Backward compatibility of the record schema |
+| `export_test.go` | CSV/JSON export shape |
+| `metrics_test.go` | Evidence write metrics |
+
+### Adjacent suites (counted separately)
+
+The embedded OPA/Rego policies have their own test suite that runs under the `opa`
+toolchain rather than `go test`, so it is **not** included in the Go conformance
+count. Run it with `make opa-test`. Integration and end-to-end tiers
+(`make test-integration`, `make test-e2e`) exercise the same paths through the
+running binary and are likewise tracked separately.
+
+## What the number means — and what it does not
+
+- It **does** mean: the evidence and policy code paths have this many passing,
+  deterministic checks that anyone can re-run, and a regression that breaks one of
+  them fails CI.
+- It **does not** mean: that the suite is exhaustive, that it covers every
+  configuration, or that a high count by itself demonstrates a compliance outcome.
+  Talon produces supporting controls and evidence; coverage and limitations are
+  documented in [`LIMITATIONS.md`](../../LIMITATIONS.md).
+
+The count is a floor that grows as tests are added; it is not a marketing target.
+Treat the live output of `make conformance` as the source of truth.

_{You can send follow-ups to the cloud agent here.}

^{Reviewed by Cursor Bugbot for commit dd9b197. Configure here.}

cursor · 2026-06-03T09:57:01Z


+      - name: Conformance count (evidence + policy paths)
+        run: |
+          make conformance | tee -a "$GITHUB_STEP_SUMMARY"


CI pipe masks conformance failure

High Severity

The conformance step pipes make conformance into tee, but the workflow does not enable pipefail, so the step’s exit status is tee’s (usually zero) even when make conformance exits non-zero after test failures.

^{Reviewed by Cursor Bugbot for commit dd9b197. Configure here.}

… count (#120) Add a `make conformance` target that runs the evidence + policy test paths and prints the passing test/subtest count (currently 317), failing if any test fails. Surface the number in CI (printed to the step summary) and in the README, and document what counts as a conformance test, how to reproduce it from a clean checkout, and what the number does and does not mean in docs/reference/conformance.md. Closes #120

cursor Bot reviewed Jun 3, 2026

View reviewed changes

sergeyenin and others added 2 commits June 3, 2026 12:16

fix(ci): enable pipefail so conformance failures are not masked by tee

f87cc6a

sergeyenin force-pushed the docs/conformance-counts-120 branch from 0550a77 to f87cc6a Compare June 3, 2026 10:16

sergeyenin merged commit 07836d7 into main Jun 3, 2026
4 checks passed

sergeyenin deleted the docs/conformance-counts-120 branch June 3, 2026 10:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(conformance): publish reproducible evidence + policy conformance count (#120)#163

docs(conformance): publish reproducible evidence + policy conformance count (#120)#163
sergeyenin merged 2 commits into
mainfrom
docs/conformance-counts-120

sergeyenin commented Jun 3, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment •

edited

Loading

Uh oh!

cursor Bot Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sergeyenin commented Jun 3, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 3, 2026

Choose a reason for hiding this comment

CI pipe masks conformance failure

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sergeyenin commented Jun 3, 2026 •

edited by cursor Bot

Loading

cursor Bot left a comment •

edited

Loading