Assay

CI-native evidence compiler for MCP and A2A governance
Deterministic policy enforcement, canonical evidence, and reviewable trust artifacts for agent systems.

See It Work · Quick Start · CI Guide · Discussions

Your MCP agent calls read_file, exec, web_search — but should it, and what can you honestly prove about that run afterward?

Assay turns agent tool/runtime outcomes into reviewable evidence artifacts with explicit evidence levels: verified, self_reported, inferred, or absent. The wedge is familiar: sit between the agent and MCP servers, allow or deny tool calls from policy, and record every decision. The broader output is canonical evidence, bounded Trust Basis claims, Trust Cards, SARIF, and CI gates you can hand to review without a hosted backend.

Positioning: Assay is a CI-native evidence compiler for agent governance. It is not a trust-score engine, a generic eval dashboard, or an observability product with a thin security veneer. See What Assay is and is not for the current boundary.


Enforce	Intercept MCP tool calls, apply policy, ALLOW / DENY deterministically.
Compile	Turn traces, decisions, and bundles into canonical evidence — not raw OTel or ad hoc logs as truth.
Prove	Export tamper-evident bundles, Trust Basis (`trust-basis.json`), Trust Card (`trustcard.json` / `trustcard.md` / `trustcard.html`), SARIF, and CI gates.

No hosted backend. No API keys for core flows. Deterministic — same input, same decision, every time.

Evidence levels

Trust claims use explicit epistemology, not a single “safety score”:

Level	Meaning
`verified`	Backed by direct evidence or offline verification in the bundle/path
`self_reported`	Emitted by the system without stronger independent corroboration
`inferred`	Derived from bounded, documented rules
`absent`	No trustworthy evidence supports the claim

Assay does not ship a primary aggregate trust score or a safe/unsafe badge as the main output. See ADR-033.

What ships today

Output	Role
Policy gate	MCP `wrap` — deterministic allow/deny before tools run (see CLI note below the diagram).
Evidence bundle	Offline-verifiable, tamper-evident archive for audit and replay.
External receipts	Selected eval outcomes, runtime decision details, and inventory/provenance surfaces as bounded evidence receipts with JSON Schema contracts.
Trust Basis	Canonical `trust-basis.json` — bounded claim classification from verified bundles.
Trust Card	`trustcard.json` / `trustcard.md` / `trustcard.html` — same claims, review-friendly artifacts.
SARIF / CI	GitHub Action, Security tab integration, policy gates on PRs.

Repository truth: release notes and CHANGELOG.md remain the authority for what is actually public. main may carry release-prep commits before a tag is cut; crates.io publication is separate from repository merge state.

  Agent ──► Assay ──► MCP Server
              │
              ├─ ✅ ALLOW / ❌ DENY  (policy)
              ├─► 📋 Evidence bundle (verifiable)
              └─► 📊 Trust Basis → Trust Card → SARIF / CI

CLI: The mcp command group is hidden from top-level assay --help while the surface stabilizes; it is supported. Use assay mcp --help, assay mcp wrap …, or follow the MCP Quickstart.

Wedge, not category. “MCP firewall” describes the control plane; trust compilation describes the outcome: reviewable claims backed by evidence. See ADR-033 and RFC-005.

See It Work

cargo install assay-cli

mkdir -p /tmp/assay-demo && echo "safe content" > /tmp/assay-demo/safe.txt

assay mcp wrap --policy examples/mcp-quickstart/policy.yaml \
  -- npx @modelcontextprotocol/server-filesystem /tmp/assay-demo

✅ ALLOW  read_file  path=/tmp/assay-demo/safe.txt  reason=policy_allow
✅ ALLOW  list_dir   path=/tmp/assay-demo/           reason=policy_allow
❌ DENY   read_file  path=/tmp/outside-demo.txt      reason=path_constraint_violation
❌ DENY   exec       cmd=ls                          reason=tool_denied

Inspect the audit artifact:

assay evidence show demo/fixtures/bundle.tar.gz

The bundle is tamper-evident and cryptographically verifiable. Signed mandate events can include an Ed25519-backed authorization trail for high-risk actions.

Trust artifacts from a verified bundle

Install from crates.io or source (cargo install --path crates/assay-cli), then:

# Machine-readable claim basis (deterministic, claim-first)
assay trust-basis generate demo/fixtures/bundle.tar.gz > trust-basis.json

# Human + machine Trust Card (schema v5 — ten trust claims; key by `id`, not row count)
assay trustcard generate demo/fixtures/bundle.tar.gz --out-dir ./trust-out
# → trust-out/trustcard.json , trust-out/trustcard.md , trust-out/trustcard.html

trust-basis.json emits claims from a bounded, versioned vocabulary for this schema (examples: bundle_verified, delegation_context_visible, authorization_context_visible, containment_degradation_observed, external_eval_receipt_boundary_visible, external_decision_receipt_boundary_visible, external_inventory_receipt_boundary_visible, …). Claim id values are stable across runs, but consumers must not rely on row count or ordering; always key by id. It is not a scalar trust score. The Trust Card is a deterministic render of the same claim rows plus frozen non-goals; trustcard.json is canonical, while Markdown and static HTML are reviewer projections. Contract versions, pack floors, and release checklist: docs/architecture/MIGRATION-TRUST-COMPILER-3.2.md, docs/reference/receipt-family-matrix.json.

In the v3.8.0 and later lines, supported external eval outcomes, runtime decision details, and model inventory/provenance surfaces can enter this compiler path as bounded receipts rather than full upstream truth, with machine-readable JSON Schema contracts for the supported receipt/import surfaces. The first three claim-visible families are Promptfoo assertion-component results, OpenFeature boolean EvaluationDetails, and CycloneDX ML-BOM model components; Evidence Receipts for AI Outcomes, Runtime Decisions, and Model Inventory explains the three-family surface, and Evidence Receipts in Action shows the same path with small checked-in artifacts generated from released Assay/Harness versions. The v3.9.0 line adds direct Trust Basis assertions, CLI schema inspection/validation, static Trust Card HTML, and MCP policy/tool digest visibility as review surfaces; those additions do not create new receipt families or new Trust Basis claims. The v3.10.0 line is a release of hardening and maintainability: Wave 51 module splits, workflow-security gates, OWASP MCP fixtures, release-lane cleanup, and the first bounded LiveKit tool-action importer slice. It does not add a new claim-visible Trust Basis family.

Trust Compiler release line

Release v3.8.0 is the first machine-readable receipt-contract line for the three-family evidence-portability surface. The v3.9.0 line makes that surface directly assertable, inspectable, reviewable, and digest-bound on supported MCP decision evidence. The v3.9.1 patch line publishes the public three-family evidence receipts note under an immutable release tag; the v3.9.2 patch line prepares the proof page, assurance mapping note, and P57 seeding pack for the same release-truth discipline without adding a new public claim-visible family. The v3.10.0 line hardens the repository and release posture around that surface: Wave 51 internal splits, security fixtures/gates, runner and workflow-security cleanup, and a bounded LiveKit tool-action importer slice without a new claim-visible Trust Basis family. It carries forward v3.3.0 as the first release that shipped both built-in evidence lint companion packs (mcp-signal-followup, a2a-signal-followup), v3.4.0 as the public line for G4-A Phase 1 (payload.discovery), built-in P2c (a2a-discovery-card-followup), K1-A Phase 1 (payload.handoff), v3.5.0 as the first public release of K2-A Phase 1 (episode_start.meta.mcp.authorization_discovery), v3.5.1 as the official-MCP-Registry publication foundation for assay-mcp-server, v3.6.0 as the first external-eval receipt lane for Promptfoo assertion-component results, and v3.7.0 as the first claim-visible runtime decision and model inventory/provenance line. v3.8.0 adds JSON Schema contracts for the bounded receipt/import surfaces; v3.9.0 adds trust-basis assert, evidence schema CLI access, static Trust Card HTML, and policy/tool digest visibility for supported MCP decisions. Pack YAML still distinguishes the substrate floor >=3.2.3 from the G4-A / P2c floor >=3.3.0 — see MIGRATION — Trust Compiler 3.2.

Is This For Me?

Yes, if you:

Build with Claude Desktop, Cursor, Windsurf, or any MCP client
Ship agents that call tools and you need to control which ones
Want a CI gate that catches tool-call regressions before production
Need bounded auditability and trust artifacts, not only sampled observability

Not yet, if you:

Don't use MCP (Assay is MCP-native; other protocols use adapters)
Need a hosted dashboard (Assay is CLI-first and offline)
Want a magic trust score or badge as the main output

Add to Cursor in 30 Seconds

Assay ships a helper that finds your local Cursor MCP config path and prints a ready-to-paste entry:

assay mcp config-path cursor

It generates JSON like:

{
  "filesystem-secure": {
    "command": "assay",
    "args": [
      "mcp",
      "wrap",
      "--policy",
      "/path/to/policy.yaml",
      "--",
      "npx",
      "-y",
      "@modelcontextprotocol/server-filesystem",
      "/Users/you"
    ]
  }
}

The same wrapped command works in other MCP clients — see MCP Quick Start.

Policy Is Simple

version: "2.0"
name: "my-policy"

tools:
  allow: ["read_file", "list_dir"]
  deny: ["exec", "shell", "write_file"]

schemas:
  read_file:
    type: object
    additionalProperties: false
    properties:
      path:
        type: string
        pattern: "^/app/.*"
        minLength: 1
    required: ["path"]

Legacy constraints: policies still work. Use assay policy migrate for the v2 JSON Schema form, or assay init --from-trace trace.jsonl to generate from observed behavior.

See Policy Files.

OpenTelemetry In, Canonical Evidence Out

Assay ingests OpenTelemetry JSONL, builds replayable traces, and exports canonical evidence — OTel is a bridge, not the sole semantic authority.

assay trace ingest-otel \
  --input otel-export.jsonl \
  --db .eval/eval.db \
  --out-trace traces/otel.v2.jsonl

See OpenTelemetry & Langfuse.

Add to CI

# .github/workflows/assay.yml
name: Assay Gate
on: [push, pull_request]
permissions:
  contents: read
  security-events: write
jobs:
  assay:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: Rul1an/assay-action@v2

PRs that violate policy get blocked; SARIF can surface in the Security tab.

Why Assay (trust compiler)


Canonical evidence	Assay’s evidence model is the stable contract; OTel and adapters map into it.
Deterministic	Same input, same decision — not probabilistic.
Portable artifacts	Bundles, Trust Basis, Trust Card, SARIF — for CI, review, audit.
Bounded claims	Explicit about what is verified vs visible vs absent — no score-first UX.
MCP-native wedge	`assay mcp wrap` is the fast path (the `mcp` group is hidden from `assay --help`; use `assay mcp --help`). Adapters extend the same engine.
Offline-first	No backend required for core enforcement and bundle verification.

Beyond MCP: Protocol Adapters

Assay ships adapters that map protocol events into canonical evidence (same policy and evidence story, different transports):

Protocol	Adapter	What it maps
ACP (OpenAI/Stripe)	`assay-adapter-acp`	Checkout events, payment intents, tool calls
A2A (Google)	`assay-adapter-a2a`	Agent capabilities, task delegation, artifacts
UCP (Google/Shopify)	`assay-adapter-ucp`	Discover/buy/post-purchase state transitions

Adapter crates are workspace / binary–driven (not published as separate crates.io packages); consume them via this repo or released assay builds.

Governance stays protocol-agnostic; the evidence and claim layer stays the same as protocols evolve.

Measured Latency

On the M1 Pro/macOS fragmented-IPI harness, protected tool-decision path:

Main protection run: 0.771ms p50 / 1.913ms p95
Fast-path scenario: 0.345ms p50 / 1.145ms p95

These are tool-decision timings, not end-to-end model latency. (See Research & experiments for methodology context.)

Install

cargo install assay-cli

CI: GitHub Action. Python SDK: pip install assay-it

Learn More

MCP Quickstart — filesystem server walkthrough
Policy Files — YAML schema for assay mcp wrap
OpenTelemetry & Langfuse — traces → replay and evidence
CI Guide — GitHub Action
Evidence Store — S3, B2, MinIO
ADR-033: Trust compiler positioning
RFC-005: Trust compiler MVP & Trust Card

Research, mappings & experiments

Bounded context: numbers below support mapping and experiments, not a product “security score.”

OWASP MCP Top 10 Mapping — how Assay relates to each risk category (coverage is not a scalar guarantee).
Third-party survey: popular MCP servers often show weak defaults — Assay adds policy + evidence; see discussion in the mapping doc.
Security experiments — attack vectors and harness notes (methodology matters more than headline counts).

Contributing

cargo test --workspace
cargo clippy --workspace --all-targets -- -D warnings

See CONTRIBUTING.md. Discussions: GitHub Discussions — seed topics for pinned threads live in docs/community/DISCUSSIONS.md.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2,400 Commits
.cargo		.cargo
.devcontainer		.devcontainer
.github		.github
assay-action		assay-action
assay-demo		assay-demo
assay-python-sdk		assay-python-sdk
crates		crates
demo		demo
docker		docker
docs		docs
examples		examples
fuzz		fuzz
infra/bpf-runner		infra/bpf-runner
packaging		packaging
packs		packs
schemas		schemas
scripts		scripts
tests		tests
traces		traces
.dockerignore		.dockerignore
.gitguardian.yaml		.gitguardian.yaml
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.typos.toml		.typos.toml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
assay.yaml		assay.yaml
catalog-info.yaml		catalog-info.yaml
ci-eval.yaml		ci-eval.yaml
deny.toml		deny.toml
eval.yaml		eval.yaml
mkdocs.yml		mkdocs.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assay

Evidence levels

What ships today

See It Work

Trust artifacts from a verified bundle

Is This For Me?

Add to Cursor in 30 Seconds

Policy Is Simple

OpenTelemetry In, Canonical Evidence Out

Add to CI

Why Assay (trust compiler)

Beyond MCP: Protocol Adapters

Measured Latency

Install

Learn More

Research, mappings & experiments

Contributing

License

About

Uh oh!

Releases 101

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Assay

Evidence levels

What ships today

See It Work

Trust artifacts from a verified bundle

Is This For Me?

Add to Cursor in 30 Seconds

Policy Is Simple

OpenTelemetry In, Canonical Evidence Out

Add to CI

Why Assay (trust compiler)

Beyond MCP: Protocol Adapters

Measured Latency

Install

Learn More

Research, mappings & experiments

Contributing

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 101

Contributors

Uh oh!

Languages