CI-native evidence compiler for MCP and A2A governance
Deterministic policy enforcement, canonical evidence, and reviewable trust artifacts for agent systems.
See It Work · Quick Start · CI Guide · Discussions
Your MCP agent calls read_file, exec, web_search — but should it, and what can you honestly prove about that run afterward?
Assay turns agent tool/runtime outcomes into reviewable evidence artifacts with explicit evidence levels: verified, self_reported, inferred, or absent. The wedge is familiar: sit between the agent and MCP servers, allow or deny tool calls from policy, and record every decision. The broader output is canonical evidence, bounded Trust Basis claims, Trust Cards, SARIF, and CI gates you can hand to review without a hosted backend.
Positioning: Assay is a CI-native evidence compiler for agent governance. It is not a trust-score engine, a generic eval dashboard, or an observability product with a thin security veneer. See What Assay is and is not for the current boundary.
| Enforce | Intercept MCP tool calls, apply policy, ALLOW / DENY deterministically. |
| Compile | Turn traces, decisions, and bundles into canonical evidence — not raw OTel or ad hoc logs as truth. |
| Prove | Export tamper-evident bundles, Trust Basis (trust-basis.json), Trust Card (trustcard.json / trustcard.md / trustcard.html), SARIF, and CI gates. |
No hosted backend. No API keys for core flows. Deterministic — same input, same decision, every time.
Trust claims use explicit epistemology, not a single “safety score”:
| Level | Meaning |
|---|---|
verified |
Backed by direct evidence or offline verification in the bundle/path |
self_reported |
Emitted by the system without stronger independent corroboration |
inferred |
Derived from bounded, documented rules |
absent |
No trustworthy evidence supports the claim |
Assay does not ship a primary aggregate trust score or a safe/unsafe badge as the main output. See ADR-033.
| Output | Role |
|---|---|
| Policy gate | MCP wrap — deterministic allow/deny before tools run (see CLI note below the diagram). |
| Evidence bundle | Offline-verifiable, tamper-evident archive for audit and replay. |
| External receipts | Selected eval outcomes, runtime decision details, and inventory/provenance surfaces as bounded evidence receipts with JSON Schema contracts. |
| Trust Basis | Canonical trust-basis.json — bounded claim classification from verified bundles. |
| Trust Card | trustcard.json / trustcard.md / trustcard.html — same claims, review-friendly artifacts. |
| SARIF / CI | GitHub Action, Security tab integration, policy gates on PRs. |
Repository truth: release notes and CHANGELOG.md remain the authority for what is actually public.
mainmay carry release-prep commits before a tag is cut; crates.io publication is separate from repository merge state.
Agent ──► Assay ──► MCP Server
│
├─ ✅ ALLOW / ❌ DENY (policy)
├─► 📋 Evidence bundle (verifiable)
└─► 📊 Trust Basis → Trust Card → SARIF / CI
CLI: The
mcpcommand group is hidden from top-levelassay --helpwhile the surface stabilizes; it is supported. Useassay mcp --help,assay mcp wrap …, or follow the MCP Quickstart.
Wedge, not category. “MCP firewall” describes the control plane; trust compilation describes the outcome: reviewable claims backed by evidence. See ADR-033 and RFC-005.
cargo install assay-cli
mkdir -p /tmp/assay-demo && echo "safe content" > /tmp/assay-demo/safe.txt
assay mcp wrap --policy examples/mcp-quickstart/policy.yaml \
-- npx @modelcontextprotocol/server-filesystem /tmp/assay-demo✅ ALLOW read_file path=/tmp/assay-demo/safe.txt reason=policy_allow
✅ ALLOW list_dir path=/tmp/assay-demo/ reason=policy_allow
❌ DENY read_file path=/tmp/outside-demo.txt reason=path_constraint_violation
❌ DENY exec cmd=ls reason=tool_denied
Inspect the audit artifact:
assay evidence show demo/fixtures/bundle.tar.gzThe bundle is tamper-evident and cryptographically verifiable. Signed mandate events can include an Ed25519-backed authorization trail for high-risk actions.
Install from crates.io or source (cargo install --path crates/assay-cli), then:
# Machine-readable claim basis (deterministic, claim-first)
assay trust-basis generate demo/fixtures/bundle.tar.gz > trust-basis.json
# Human + machine Trust Card (schema v5 — ten trust claims; key by `id`, not row count)
assay trustcard generate demo/fixtures/bundle.tar.gz --out-dir ./trust-out
# → trust-out/trustcard.json , trust-out/trustcard.md , trust-out/trustcard.htmltrust-basis.json emits claims from a bounded, versioned vocabulary for this schema (examples: bundle_verified, delegation_context_visible, authorization_context_visible, containment_degradation_observed, external_eval_receipt_boundary_visible, external_decision_receipt_boundary_visible, external_inventory_receipt_boundary_visible, …). Claim id values are stable across runs, but consumers must not rely on row count or ordering; always key by id. It is not a scalar trust score. The Trust Card is a deterministic render of the same claim rows plus frozen non-goals; trustcard.json is canonical, while Markdown and static HTML are reviewer projections. Contract versions, pack floors, and release checklist: docs/architecture/MIGRATION-TRUST-COMPILER-3.2.md, docs/reference/receipt-family-matrix.json.
In the v3.8.0 and later lines, supported external eval outcomes, runtime decision details, and model inventory/provenance surfaces can enter this compiler path as bounded receipts rather than full upstream truth, with machine-readable JSON Schema contracts for the supported receipt/import surfaces. The first three claim-visible families are Promptfoo assertion-component results, OpenFeature boolean EvaluationDetails, and CycloneDX ML-BOM model components; Evidence Receipts for AI Outcomes, Runtime Decisions, and Model Inventory explains the three-family surface, and Evidence Receipts in Action shows the same path with small checked-in artifacts generated from released Assay/Harness versions. The v3.9.0 line adds direct Trust Basis assertions, CLI schema inspection/validation, static Trust Card HTML, and MCP policy/tool digest visibility as review surfaces; those additions do not create new receipt families or new Trust Basis claims. The v3.10.0 line is a release of hardening and maintainability: Wave 51 module splits, workflow-security gates, OWASP MCP fixtures, release-lane cleanup, and the first bounded LiveKit tool-action importer slice. It does not add a new claim-visible Trust Basis family.
Trust Compiler release line
Release v3.8.0 is the first machine-readable receipt-contract line for the three-family evidence-portability surface. The v3.9.0 line makes that surface directly assertable, inspectable, reviewable, and digest-bound on supported MCP decision evidence. The v3.9.1 patch line publishes the public three-family evidence receipts note under an immutable release tag; the v3.9.2 patch line prepares the proof page, assurance mapping note, and P57 seeding pack for the same release-truth discipline without adding a new public claim-visible family. The v3.10.0 line hardens the repository and release posture around that surface: Wave 51 internal splits, security fixtures/gates, runner and workflow-security cleanup, and a bounded LiveKit tool-action importer slice without a new claim-visible Trust Basis family. It carries forward v3.3.0 as the first release that shipped both built-in evidence lint companion packs (mcp-signal-followup, a2a-signal-followup), v3.4.0 as the public line for G4-A Phase 1 (payload.discovery), built-in P2c (a2a-discovery-card-followup), K1-A Phase 1 (payload.handoff), v3.5.0 as the first public release of K2-A Phase 1 (episode_start.meta.mcp.authorization_discovery), v3.5.1 as the official-MCP-Registry publication foundation for assay-mcp-server, v3.6.0 as the first external-eval receipt lane for Promptfoo assertion-component results, and v3.7.0 as the first claim-visible runtime decision and model inventory/provenance line. v3.8.0 adds JSON Schema contracts for the bounded receipt/import surfaces; v3.9.0 adds trust-basis assert, evidence schema CLI access, static Trust Card HTML, and policy/tool digest visibility for supported MCP decisions. Pack YAML still distinguishes the substrate floor >=3.2.3 from the G4-A / P2c floor >=3.3.0 — see MIGRATION — Trust Compiler 3.2.
Yes, if you:
- Build with Claude Desktop, Cursor, Windsurf, or any MCP client
- Ship agents that call tools and you need to control which ones
- Want a CI gate that catches tool-call regressions before production
- Need bounded auditability and trust artifacts, not only sampled observability
Not yet, if you:
- Don't use MCP (Assay is MCP-native; other protocols use adapters)
- Need a hosted dashboard (Assay is CLI-first and offline)
- Want a magic trust score or badge as the main output
Assay ships a helper that finds your local Cursor MCP config path and prints a ready-to-paste entry:
assay mcp config-path cursorIt generates JSON like:
{
"filesystem-secure": {
"command": "assay",
"args": [
"mcp",
"wrap",
"--policy",
"/path/to/policy.yaml",
"--",
"npx",
"-y",
"@modelcontextprotocol/server-filesystem",
"/Users/you"
]
}
}The same wrapped command works in other MCP clients — see MCP Quick Start.
version: "2.0"
name: "my-policy"
tools:
allow: ["read_file", "list_dir"]
deny: ["exec", "shell", "write_file"]
schemas:
read_file:
type: object
additionalProperties: false
properties:
path:
type: string
pattern: "^/app/.*"
minLength: 1
required: ["path"]Legacy constraints: policies still work. Use assay policy migrate for the v2 JSON Schema form, or assay init --from-trace trace.jsonl to generate from observed behavior.
See Policy Files.
Assay ingests OpenTelemetry JSONL, builds replayable traces, and exports canonical evidence — OTel is a bridge, not the sole semantic authority.
assay trace ingest-otel \
--input otel-export.jsonl \
--db .eval/eval.db \
--out-trace traces/otel.v2.jsonl# .github/workflows/assay.yml
name: Assay Gate
on: [push, pull_request]
permissions:
contents: read
security-events: write
jobs:
assay:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: Rul1an/assay-action@v2PRs that violate policy get blocked; SARIF can surface in the Security tab.
| Canonical evidence | Assay’s evidence model is the stable contract; OTel and adapters map into it. |
| Deterministic | Same input, same decision — not probabilistic. |
| Portable artifacts | Bundles, Trust Basis, Trust Card, SARIF — for CI, review, audit. |
| Bounded claims | Explicit about what is verified vs visible vs absent — no score-first UX. |
| MCP-native wedge | assay mcp wrap is the fast path (the mcp group is hidden from assay --help; use assay mcp --help). Adapters extend the same engine. |
| Offline-first | No backend required for core enforcement and bundle verification. |
Assay ships adapters that map protocol events into canonical evidence (same policy and evidence story, different transports):
| Protocol | Adapter | What it maps |
|---|---|---|
| ACP (OpenAI/Stripe) | assay-adapter-acp |
Checkout events, payment intents, tool calls |
| A2A (Google) | assay-adapter-a2a |
Agent capabilities, task delegation, artifacts |
| UCP (Google/Shopify) | assay-adapter-ucp |
Discover/buy/post-purchase state transitions |
Adapter crates are workspace / binary–driven (not published as separate crates.io packages); consume them via this repo or released assay builds.
Governance stays protocol-agnostic; the evidence and claim layer stays the same as protocols evolve.
On the M1 Pro/macOS fragmented-IPI harness, protected tool-decision path:
- Main protection run:
0.771msp50 /1.913msp95 - Fast-path scenario:
0.345msp50 /1.145msp95
These are tool-decision timings, not end-to-end model latency. (See Research & experiments for methodology context.)
cargo install assay-cliCI: GitHub Action. Python SDK: pip install assay-it
- MCP Quickstart — filesystem server walkthrough
- Policy Files — YAML schema for
assay mcp wrap - OpenTelemetry & Langfuse — traces → replay and evidence
- CI Guide — GitHub Action
- Evidence Store — S3, B2, MinIO
- ADR-033: Trust compiler positioning
- RFC-005: Trust compiler MVP & Trust Card
Bounded context: numbers below support mapping and experiments, not a product “security score.”
- OWASP MCP Top 10 Mapping — how Assay relates to each risk category (coverage is not a scalar guarantee).
- Third-party survey: popular MCP servers often show weak defaults — Assay adds policy + evidence; see discussion in the mapping doc.
- Security experiments — attack vectors and harness notes (methodology matters more than headline counts).
cargo test --workspace
cargo clippy --workspace --all-targets -- -D warningsSee CONTRIBUTING.md. Discussions: GitHub Discussions — seed topics for pinned threads live in docs/community/DISCUSSIONS.md.