diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md index 1c47bf1..75c1a70 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -39,4 +39,4 @@ assignees: '' ## Conformance test ID (if applicable) - + diff --git a/CHARTER.md b/CHARTER.md index db62ae1..20371b0 100644 --- a/CHARTER.md +++ b/CHARTER.md @@ -1,7 +1,7 @@ -# Technical Charter — cMCP +# Technical Charter: cMCP **Proposed hosting**: Agentic AI Foundation (AAIF). -**Status**: Pre-acceptance draft — effective upon host organization acceptance. +**Status**: Pre-acceptance draft: effective upon host organization acceptance. > **Note for external contributors:** This charter is a working draft and has not yet been accepted by a host organization. Governance terms, IP policy, and trademark ownership described here are proposed, not final. Do not assume binding foundation commitments until formal acceptance. @@ -11,18 +11,18 @@ ## 1. Mission -The cMCP project develops and maintains an open implementation of the Confidential MCP (Model Context Protocol) gateway — hardware-attested policy enforcement for AI agent tool calls. The mission is to make tool-call authorization cryptographically verifiable by any party, without trusting the operator, without requiring closed infrastructure, and without vendor lock-in to any silicon vendor, cloud provider, or AI platform. +The cMCP project develops and maintains an open implementation of the Confidential MCP (Model Context Protocol) gateway: hardware-attested policy enforcement for AI agent tool calls. The mission is to make tool-call authorization cryptographically verifiable by any party, without trusting the operator, without requiring closed infrastructure, and without vendor lock-in to any silicon vendor, cloud provider, or AI platform. ## 2. Scope The project includes: -- **cMCP Gateway** — the reference open-source implementation of the confidential gateway, including the gRPC/HTTP proxy, policy engine, and hardware attestation bridge. -- **Hardware Provider API** — the normalized interface (`BaseProvider`) for integrating TEE platforms (TPM, AMD SEV-SNP, Intel TDX, and others). -- **Python SDK** — the `cmcp-gateway` package and client libraries for policy authoring and runtime integration. -- **TRACE integration** — built-in emission of TRACE Trust Records for every attested tool call (see [agentrust-io/trace-spec](https://github.com/agentrust-io/trace-spec)). -- **Agent Manifest binding** — verification of agent identity via Agent Manifest at tool-call time (see [agentrust-io/agent-manifest](https://github.com/agentrust-io/agent-manifest)). -- **Integration examples** — working examples across financial services, healthcare, and multi-tenant SaaS deployment patterns. +- **cMCP Gateway**: the reference open-source implementation of the confidential gateway, including the gRPC/HTTP proxy, policy engine, and hardware attestation bridge. +- **Hardware Provider API**: the normalized interface (`BaseProvider`) for integrating TEE platforms (TPM, AMD SEV-SNP, Intel TDX, and others). +- **Python SDK**: the `cmcp-gateway` package and client libraries for policy authoring and runtime integration. +- **TRACE integration**: built-in emission of TRACE Trust Records for every attested tool call (see [agentrust-io/trace-spec](https://github.com/agentrust-io/trace-spec)). +- **Agent Manifest binding**: verification of agent identity via Agent Manifest at tool-call time (see [agentrust-io/agent-manifest](https://github.com/agentrust-io/agent-manifest)). +- **Integration examples**: working examples across financial services, healthcare, and multi-tenant SaaS deployment patterns. Out of scope: AI model governance beyond tool-call enforcement, hardware TEE platform SDKs, network-level policy outside the MCP protocol boundary, and MCP server implementations themselves. @@ -59,18 +59,18 @@ Use of "cMCP-compatible" to describe a gateway deployment requires that the impl cMCP builds on and does not replace: -- **MCP (Model Context Protocol, Anthropic)** — the underlying tool-call protocol that cMCP extends with attestation -- **TRACE** ([agentrust-io/trace-spec](https://github.com/agentrust-io/trace-spec)) — governance record emitted per attested tool call -- **Agent Manifest** ([agentrust-io/agent-manifest](https://github.com/agentrust-io/agent-manifest)) — agent identity bound at tool-call time -- **SPIFFE / SPIRE** — workload identity for gateway and agent services -- **RATS / EAT (RFC 9711)** — attestation evidence format -- **OPA / Cedar** — policy engine integration surface +- **MCP (Model Context Protocol, Anthropic)**: the underlying tool-call protocol that cMCP extends with attestation +- **TRACE** ([agentrust-io/trace-spec](https://github.com/agentrust-io/trace-spec)): governance record emitted per attested tool call +- **Agent Manifest** ([agentrust-io/agent-manifest](https://github.com/agentrust-io/agent-manifest)): agent identity bound at tool-call time +- **SPIFFE / SPIRE**: workload identity for gateway and agent services +- **RATS / EAT (RFC 9711)**: attestation evidence format +- **OPA / Cedar**: policy engine integration surface ## 7. Transition timeline | Milestone | Target | |---|---| -| v0.1 developer preview — CC Summit announcement | June 2026 | +| v0.1 developer preview: CC Summit announcement | June 2026 | | Hardware provider API stabilization, TRACE v0.2 integration | Q3 2026 | | AAIF project proposal submission | Q3 2026 | | v1.0 stable release under TSC governance | 2027 | diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index bc8e79d..c39b6cd 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -4,7 +4,7 @@ Thank you for contributing. This document covers everything you need to get star ## Before you start -cMCP is a hardware-attested policy gateway. Changes to the TEE boundary, signing path, audit chain, or TRACE Claim generation require extra care — these are security-critical components. When in doubt, open an issue first. +cMCP is a hardware-attested policy gateway. Changes to the TEE boundary, signing path, audit chain, or TRACE Claim generation require extra care: these are security-critical components. When in doubt, open an issue first. ## Developer certificate of origin @@ -54,7 +54,7 @@ Keep commits small and focused. One logical change per commit. Do not bundle unr ## Pull request process 1. Branch from `main`: `git checkout -b feat/your-change` -2. Write tests for new behaviour — the test suite must pass +2. Write tests for new behaviour: the test suite must pass 3. Run all four checks locally (see above) 4. Open a PR against `main` with the template filled in 5. At least one maintainer must approve before merge @@ -64,9 +64,9 @@ Keep commits small and focused. One logical change per commit. Do not bundle unr Changes to these paths require two maintainer approvals and a comment explaining the security impact: -- `src/cmcp_gateway/audit/` — signing, audit chain, TRACE Claim generation -- `src/cmcp_gateway/tee/` — TEE provider integration -- `src/cmcp_gateway/policy/` — Cedar policy evaluation +- `src/cmcp_gateway/audit/`: signing, audit chain, TRACE Claim generation +- `src/cmcp_gateway/tee/`: TEE provider integration +- `src/cmcp_gateway/policy/`: Cedar policy evaluation ## Reporting security vulnerabilities @@ -77,7 +77,7 @@ Do **not** open a public issue. Use [GitHub Security Advisories](https://github. - Python 3.11+ syntax throughout (`X | Y`, `match`, etc.) - `ruff` enforces style; do not add `# noqa` without a comment explaining why - `mypy --strict` on `src/cmcp_gateway/`; new public functions need type annotations -- No comments that describe *what* the code does — only *why* when non-obvious +- No comments that describe *what* the code does: only *why* when non-obvious - Tests live in `tests/unit/` and follow the existing `test_.py` naming ## Questions diff --git a/GOVERNANCE.md b/GOVERNANCE.md index e734a28..9484337 100644 --- a/GOVERNANCE.md +++ b/GOVERNANCE.md @@ -36,7 +36,7 @@ Reviewers do not have merge access but their approval counts toward the merge re ### Maintainer -A Reviewer who has held that role for **at least 60 days** and has demonstrated sustained contributions — consistent review activity, issue triage, or code — may be nominated for Maintainer by any existing Maintainer. Maintainer status requires explicit approval by 2/3 of current Maintainers. +A Reviewer who has held that role for **at least 60 days** and has demonstrated sustained contributions: consistent review activity, issue triage, or code: may be nominated for Maintainer by any existing Maintainer. Maintainer status requires explicit approval by 2/3 of current Maintainers. Maintainers have merge access to `main` and are collectively responsible for the health of the project. @@ -48,7 +48,7 @@ Maintainers have merge access to `main` and are collectively responsible for the ### Day-to-day changes (lazy consensus) -Most decisions — feature additions, bug fixes, documentation, refactors — are made by **lazy consensus on pull requests**. A PR is mergeable when: +Most decisions: feature additions, bug fixes, documentation, refactors: are made by **lazy consensus on pull requests**. A PR is mergeable when: - At least one Maintainer has approved it, and - No Maintainer has raised a blocking objection within **5 business days** of the last substantive change. diff --git a/MAINTAINERS.md b/MAINTAINERS.md index 8ce88f8..0d1d155 100644 --- a/MAINTAINERS.md +++ b/MAINTAINERS.md @@ -8,9 +8,9 @@ ## Roles -**Reviewer** — triages issues, reviews pull requests, and approves changes within their area of expertise. +**Reviewer**: triages issues, reviews pull requests, and approves changes within their area of expertise. -**Maintainer** — holds merge rights, participates in roadmap decisions, and is responsible for the health of the project. +**Maintainer**: holds merge rights, participates in roadmap decisions, and is responsible for the health of the project. ## Becoming a Reviewer diff --git a/ROADMAP.md b/ROADMAP.md index 1714faa..1dd8314 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -1,6 +1,6 @@ # cMCP Roadmap -## v0.1 — Initial Release (June 2026) +## v0.1: Initial Release (June 2026) Scope: Minimal viable trust layer for MCP servers, sufficient for early adopters to evaluate the attestation and policy model. @@ -9,7 +9,7 @@ Scope: Minimal viable trust layer for MCP servers, sufficient for early adopters - TRACE Claim generation from attestation evidence - Standalone verifier CLI for offline claim inspection -## v0.2 — Released (June 2026) +## v0.2: Released (June 2026) Provider-specific attestation verification: - TPM2 quote verification @@ -27,7 +27,7 @@ Observability: Transparency: - Transparency log integration for TRACE Claim anchoring (write and lookup) -## v1.0 — Stable Targets +## v1.0: Stable Targets - Stable `GatewayClaim` schema with documented versioning guarantees - Full RATS/EAT conformance (RFC 9334, draft-ietf-rats-eat) diff --git a/SECURITY.md b/SECURITY.md index 39c928a..a011e77 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -19,17 +19,17 @@ Timeline starts when the issue is confirmed as a valid vulnerability, not on ini The following components are in scope: -- **TEE attestation path** — measurement of policy bundle hash into hardware attestation report; attestation verification logic for TPM 2.0, AMD SEV-SNP, Intel TDX, and OPAQUE Managed Runtime providers -- **Signing key handling** — hardware-sealed key generation, storage, and use; any path by which a signing key could be extracted or used outside the enclave -- **Cedar policy enforcement** — correctness of allow/deny decisions; policy bundle loading and hash verification inside the enclave; enforcement mode handling -- **Audit chain** — integrity of TRACE claim output fields (`policy_bundle_hash`, `audit_chain_root`, `tee_public_key`); any path by which a valid audit entry could be forged or suppressed +- **TEE attestation path**: measurement of policy bundle hash into hardware attestation report; attestation verification logic for TPM 2.0, AMD SEV-SNP, Intel TDX, and OPAQUE Managed Runtime providers +- **Signing key handling**: hardware-sealed key generation, storage, and use; any path by which a signing key could be extracted or used outside the enclave +- **Cedar policy enforcement**: correctness of allow/deny decisions; policy bundle loading and hash verification inside the enclave; enforcement mode handling +- **Audit chain**: integrity of TRACE claim output fields (`policy_bundle_hash`, `audit_chain_root`, `tee_public_key`); any path by which a valid audit entry could be forged or suppressed ## Out of Scope The following are not eligible for a coordinated disclosure: -- Bugs in TEE firmware or hardware microcode (AMD, Intel, or cloud provider trust anchor issues) — report those directly to the relevant vendor -- Vulnerabilities in the upstream Cedar policy language engine that are not specific to cMCP's integration — report those to the [Cedar project](https://github.com/cedar-policy/cedar) +- Bugs in TEE firmware or hardware microcode (AMD, Intel, or cloud provider trust anchor issues): report those directly to the relevant vendor +- Vulnerabilities in the upstream Cedar policy language engine that are not specific to cMCP's integration: report those to the [Cedar project](https://github.com/cedar-policy/cedar) - Theoretical weaknesses in TEE threat models that are already acknowledged in public literature - Issues in third-party MCP tool implementations invoked through the gateway diff --git a/docs/concepts.md b/docs/concepts.md index 62c17e2..1261e41 100644 --- a/docs/concepts.md +++ b/docs/concepts.md @@ -25,8 +25,8 @@ The key difference: | | Log | TRACE Claim | |---|---|---| | Who produces it | Agent process or operator | TEE firmware + cMCP runtime | -| Can it be edited post-hoc? | Yes | No — signature would fail | -| Can the operator forge it? | Yes | No — signing key never leaves the TEE | +| Can it be edited post-hoc? | Yes | No: signature would fail | +| Can the operator forge it? | Yes | No: signing key never leaves the TEE | | Who can verify it? | Anyone with read access | Anyone with the public key (no trust in operator) | A TRACE Claim is a [TRACE Trust Record](https://trace.agentrust-io.com) with a `GatewayClaim` envelope. The envelope adds the session summary and audit chain. The inner trust record follows the TRACE v0.1 spec. @@ -44,7 +44,7 @@ Every cMCP TRACE Claim makes four categories of assertion: ## Hardware attestation: why the claim is trustworthy -The signing key for a cMCP TRACE Claim is generated inside the TEE and never leaves it. The TEE also measures its own state at boot — recording a SHA-384 digest of the firmware, the Cedar policy bundle, and the tool catalog into a PCR/measurement register. +The signing key for a cMCP TRACE Claim is generated inside the TEE and never leaves it. The TEE also measures its own state at boot: recording a SHA-384 digest of the firmware, the Cedar policy bundle, and the tool catalog into a PCR/measurement register. This means: @@ -85,7 +85,7 @@ The TRACE Claim records `audit_chain.root` (the first entry hash) and `audit_cha **Why this matters:** An auditor who has the individual audit log entries can recompute the chain tip and verify it matches the TRACE Claim. If any entry was modified, deleted, or reordered, the recomputed tip will not match. The audit log is self-certifying. -The audit chain does not need to be stored on-chain or in a third-party system. The signed TRACE Claim is sufficient to detect tampering after the fact — as long as the claim itself was not forged (which the hardware attestation prevents). +The audit chain does not need to be stored on-chain or in a third-party system. The signed TRACE Claim is sufficient to detect tampering after the fact: as long as the claim itself was not forged (which the hardware attestation prevents). See [Spec: Call Graph](spec/call-graph.md) for the full chain construction. @@ -95,11 +95,11 @@ See [Spec: Call Graph](spec/call-graph.md) for the full chain construction. Cedar is an authorization policy language designed to be auditable. cMCP uses it for three reasons: -**1. The policy is versioned and hash-bound.** The SHA-256 of the policy bundle is measured into the TEE at startup. Every TRACE Claim carries that hash. An auditor can compare the hash in a claim to the policy bundle in the repository and prove which policy was active for a given session — even if the policy was later changed. +**1. The policy is versioned and hash-bound.** The SHA-256 of the policy bundle is measured into the TEE at startup. Every TRACE Claim carries that hash. An auditor can compare the hash in a claim to the policy bundle in the repository and prove which policy was active for a given session: even if the policy was later changed. **2. Policy effects are data, not code.** Cedar policies are declarative and cannot execute arbitrary code. A `forbid` rule can block a tool call; it cannot read files or make network requests. This means policy review is tractable: the policy file is the complete specification of what the agent is allowed to do. -**3. Cedar supports fine-grained context conditions.** Policies can condition on session attributes like `session_max_sensitivity`, `workflow_id`, or `data_class`. This enables dynamic policy enforcement without code changes — the same binary can enforce different rules for different tenant configurations. +**3. Cedar supports fine-grained context conditions.** Policies can condition on session attributes like `session_max_sensitivity`, `workflow_id`, or `data_class`. This enables dynamic policy enforcement without code changes: the same binary can enforce different rules for different tenant configurations. Example: this policy blocks `salesforce.contacts` once PII has entered the session: @@ -152,8 +152,8 @@ The signed claim ties together: hardware identity (attestation), policy version ## Next steps -- [Quickstart](quickstart.md) — run a cMCP gateway locally in under 30 minutes -- [Configuration](configuration.md) — full configuration reference -- [Tutorial: Cedar policy walkthrough](tutorials/cedar-policy-walkthrough.md) — write and test policies -- [Tutorial: Verify a TRACE claim](tutorials/verifying-a-trace-claim.md) — verify a claim without trusting the operator -- [Spec: Component Model](spec/component-model.md) — detailed architecture +- [Quickstart](quickstart.md): run a cMCP gateway locally in under 30 minutes +- [Configuration](configuration.md): full configuration reference +- [Tutorial: Cedar policy walkthrough](tutorials/cedar-policy-walkthrough.md): write and test policies +- [Tutorial: Verify a TRACE claim](tutorials/verifying-a-trace-claim.md): verify a claim without trusting the operator +- [Spec: Component Model](spec/component-model.md): detailed architecture diff --git a/docs/spec/attestation.md b/docs/spec/attestation.md index 4c6b763..5d0fe82 100644 --- a/docs/spec/attestation.md +++ b/docs/spec/attestation.md @@ -55,7 +55,7 @@ measurement = SHA-256(PCR0 || PCR1 || PCR2 || PCR3 || PCR4 || PCR5 || PCR6 || PC Each PCR value is the raw 32-byte SHA-256 digest read from the TPM. Concatenation is in bank index order (0 through 7), no separators. The result is a 32-byte SHA-256 digest encoded as lowercase hex. The PCR bank used is SHA-256. If the platform only offers a SHA-1 bank, the runtime logs a warning and uses SHA-1 PCR values zero-extended to 32 bytes before hashing; this is noted in `attestation_report.measurement_note: "sha1-bank-fallback"`. -Quote generation: the gateway calls `TPM2_Quote` with `qualifying_data` set to the first 32 bytes of the §3.3 nonce — the `JWK_thumbprint(tee_public_key)` — because TPM `qualifying_data` carries a single digest. A verifier re-derives the thumbprint from `cnf.jwk.x` and checks it against the quote's `qualifying_data`. The quote and its signature are stored in `attestation_report.raw_evidence` (base64-encoded) for verifier use. +Quote generation: the gateway calls `TPM2_Quote` with `qualifying_data` set to the first 32 bytes of the §3.3 nonce: the `JWK_thumbprint(tee_public_key)`: because TPM `qualifying_data` carries a single digest. A verifier re-derives the thumbprint from `cnf.jwk.x` and checks it against the quote's `qualifying_data`. The quote and its signature are stored in `attestation_report.raw_evidence` (base64-encoded) for verifier use. #### SEV-SNP (High Assurance) @@ -273,7 +273,7 @@ The `attestation_report.report_data` field contains a 64-byte nonce that binds t nonce = JWK_thumbprint(tee_public_key) (32 bytes) || random_salt (32 bytes) ``` -- `JWK_thumbprint(tee_public_key)`: the RFC 7638 JWK Thumbprint of the Ed25519 public key — SHA-256 over the canonical JSON of the required OKP members in lexicographic order (`crv`, `kty`, `x`). This is re-derivable by any verifier from `cnf.jwk.x`. +- `JWK_thumbprint(tee_public_key)`: the RFC 7638 JWK Thumbprint of the Ed25519 public key: SHA-256 over the canonical JSON of the required OKP members in lexicographic order (`crv`, `kty`, `x`). This is re-derivable by any verifier from `cnf.jwk.x`. - `random_salt`: 32 random bytes generated once per enclave startup, so two enclave instances produce distinct nonces even with the same key (e.g. blue-green deploy). - The 64-byte value is passed as the `report_data` / `user_data` / `reportdata` / `qualifying_data` field when requesting the hardware attestation report. The field name varies by provider; the semantic is the same: a caller-supplied value included in the signed measurement. @@ -287,7 +287,7 @@ assert actual_nonce[:32] == expected_fingerprint A TRACE Claim whose `cnf.jwk` public key was substituted after attestation fails this check, because the embedded `report_data` (hardware-signed) will not match the re-derived thumbprint. A claim produced by a different enclave instance carries a different key (and salt), so it fails too. -**Session binding** is carried separately, by `gateway.session_id` inside the Ed25519-signed claim body — not by the nonce. The hardware report is generated once per enclave instance at startup, before any session exists, so it cannot bind a specific `session_id`. Because the signature covers `session_id`, a claim cannot be presented under a different session without breaking verification. See §3.3.1. +**Session binding** is carried separately, by `gateway.session_id` inside the Ed25519-signed claim body: not by the nonce. The hardware report is generated once per enclave instance at startup, before any session exists, so it cannot bind a specific `session_id`. Because the signature covers `session_id`, a claim cannot be presented under a different session without breaking verification. See §3.3.1. #### 3.3.1 Session binding diff --git a/docs/spec/cedar-policy.md b/docs/spec/cedar-policy.md index 2bce790..df06401 100644 --- a/docs/spec/cedar-policy.md +++ b/docs/spec/cedar-policy.md @@ -1,352 +1,352 @@ -# Cedar Policy Specification - -!!! warning "Draft" - Status: Draft v0.1 · Stability: Unstable — expect breaking changes before v1.0 - -This document specifies the Cedar policy bundle format, policy expression examples, enforcement modes, evaluation decision flow, and related governance features for the cMCP Runtime. - ---- - -## Section 1 : Policy Bundle Format - -A policy bundle is a directory (or tarball) with the following structure: - -``` -bundle/ - policies/ # Cedar policy files (.cedar), one per policy or logical group - schema.cedarschema # Cedar schema defining entity types, actions, and attributes - manifest.json # Provenance metadata -``` - -### manifest.json Format - -```json -{ - "version": "", - "authored_at": "", - "author_identity": "", - "commit_sha": "", - "approval_chain": [ - { - "approver": "", - "approved_at": "", - "signature": "" - } - ] -} -``` - -`approval_chain` is optional. When present, each entry is a signed approval by an authorized reviewer. See Section 5 for how approvals are verified. - -### Bundle Hash - -The bundle hash is the authoritative measurement committed into the attestation report. It is computed as: - -``` -SHA-256(canonical_json({ - "manifest": , - "policy_files": { - "": "" - // ... entries sorted lexicographically by filename - }, - "schema_hash": "" -})) -``` - -`canonical_json` means RFC 8785 (JSON Canonicalization Scheme): no insignificant whitespace, keys sorted lexicographically at every level. This ensures the hash is deterministic regardless of serialization order. - -This hash is what gets measured into the attestation report (see `policy_bundle.hash` in the TRACE Claim). Any modification to any policy file, the schema, or the manifest changes the hash, producing a measurement mismatch that verifiers can detect. - ---- - -## Section 2 : Cedar Policy Expression Examples - -The following examples show working Cedar policies for common enterprise use cases. All policies operate on the action `Action::"call_tool"`. - -### Tool Allowlist - -Permit calls only to named tools: - -```cedar -permit( - principal, - action == Action::"call_tool", - resource -) -when { - resource.tool_name in ["salesforce.query", "snowflake.read"] -}; -``` - -### Tool Denylist - -Explicitly forbid a specific tool regardless of other permits: - -```cedar -forbid( - principal, - action == Action::"call_tool", - resource -) -when { - resource.tool_name == "delete_customer_record" -}; -``` - -### Field-Level Redaction - -Permit the call but instruct the response inspector to redact sensitive fields: - -```cedar -permit( - principal, - action == Action::"call_tool", - resource -) -when { - resource.tool_name == "crm.get_customer" -} -advice { - redact_fields: ["ssn", "payment_history"] -}; -``` - -The `advice` block is evaluated by the response inspection pipeline after the upstream call returns. See `response-inspection.md` for redaction semantics. - -### Cross-System Compliance Boundary - -Forbid tool calls to uncovered servers when the session carries HIPAA PHI sensitivity: - -```cedar -forbid( - principal, - action == Action::"call_tool", - resource -) -when { - context.session_sensitivity == "hipaa_phi" && - resource.server_domain == "uncovered" -}; -``` - -### Per-Workflow Scope - -Permit tool calls only when the tool is in the workflow's allowed set: - -```cedar -permit( - principal, - action == Action::"call_tool", - resource -) -when { - context.workflow_id == "customer_onboarding" && - resource.tool_name in context.workflow_allowed_tools -}; -``` - -### Default-Deny Baseline - -Cedar is default-deny: a call is denied unless at least one `permit` matches and no `forbid` matches. To make this explicit and auditable, include a baseline forbid: - -```cedar -forbid( - principal, - action == Action::"call_tool", - resource -); -``` - -This ensures that even if the policy bundle is empty or all permits are removed, all calls are denied rather than silently allowed. - ---- - -## Section 3 : Enforcement Modes - -Enforcement mode is set in the deployment configuration, bound into the attestation report, and cannot change without an enclave restart. This makes the active mode tamper-evident. - -| Mode | Cedar deny behavior | Audit entry | -|------|--------------------|-| -| `enforcing` | Runtime rejects the call, returns a structured error to the agent | Logged with `decision=deny` | -| `advisory` | Runtime allows the call, forwards to upstream | Logged with `decision=deny_advisory` (would have been denied in enforcing mode) | -| `silent` | Runtime allows the call, forwards to upstream | Only a basic call log; no audit decision entry | - -**Structured error (enforcing mode):** - -```json -{ - "error": "tool_call_denied", - "tool_name": "", - "call_id": "", - "policy_bundle_version": "", - "message": "Tool call denied by runtime policy." -} -``` - -The error does not include the matched rule name or policy text, to avoid leaking policy internals to the agent. - ---- - -## Section 4 : Policy Evaluation Decision Flow - -``` -1. Receive MCP tool call request - inputs: tool_name, arguments, session_id, workflow_id (if present) - -2. Build Cedar evaluation context: - { - tool_name, - server_identity, - server_domain, - session_sensitivity, - workflow_id, - workflow_allowed_tools, - user_identity // if available - } - -3. Evaluate Cedar policies against: - (principal, Action::"call_tool", resource) with context - -4. If decision = permit: - proceed to egress DLP check (see egress policy documentation) - -5. If decision = deny: - enforce per enforcement_mode (see Section 3) - -6. Log audit entry: - { decision, rule_matched, latency_us, call_id } - -7. If decision = permit and DLP passes: - forward call to upstream server - -8. Receive response from upstream server - -9. Run response inspection pipeline (see response-inspection.md) - applies advice blocks (e.g., redact_fields) - -10. Log response audit entry - -11. Return (possibly redacted) response to agent -``` - -Latency budget: Cedar evaluation target is under 1 ms for bundles up to 500 policy rules. The runtime measures and logs `latency_us` for each evaluation to support SLA monitoring. - ---- - -## Section 5 : Policy Provenance (closes #26) - -The `manifest.json` provenance metadata is included in the bundle hash measurement (see Section 1). This means: - -- **Author identity** (`author_identity`): tamper-evident. Changing the identity changes the bundle hash, producing a measurement mismatch that verifiers detect. -- **Authoring timestamp** (`authored_at`): tamper-evident for the same reason. -- **Git commit** (`commit_sha`): tamper-evident. Links the policy bundle to a specific point in version control history. -- **Approval signatures** (`approval_chain`): tamper-evident. Each approval signature covers the bundle hash; removing or altering an approval changes the manifest, which changes the bundle hash. - -### Verifier Workflow - -A verifier checking a TRACE Claim can perform the following steps: - -1. Obtain the TRACE Claim and extract `policy_bundle.hash`. -2. Compare `policy_bundle.hash` against the approved hash on record (e.g., from the organization's policy registry). -3. Request the bundle tarball from the operator. -4. Recompute the bundle hash locally using the algorithm in Section 1 and verify it matches `policy_bundle.hash`. -5. Inspect `manifest.json` to confirm author, timestamp, commit SHA, and approval chain. -6. Optionally verify each approval signature against the approver's known public key. - -If any step fails, the verifier rejects the TRACE Claim. This process requires no trust in the operator: the TEE measurement is the root of trust. - ---- - -## Section 6 : Per-Workflow Cedar Policy Scope (closes #41) - -### Workflow Identity - -Workflow identity is established via session metadata. The agent includes a `workflow_id` in the session initiation request: - -- HTTP transport: `X-MCP-Workflow-ID` header -- Session configuration: `workflow_id` field in the session init payload - -If `workflow_id` is absent, the runtime defaults to `workflow_id = "default"`. The default workflow policy should be restrictive (allowlist only widely-approved tools). - -### Evaluation Order - -A tool call must pass both checks: - -1. **Catalog-level**: the tool is registered in the approved tool catalog. -2. **Workflow-level**: the tool is allowed for the current `workflow_id` per Cedar policy. - -Failing either check results in a deny decision. - -### Workflow Entity in Cedar Schema - -The per-workflow allowed-tools list can be modeled as an entity attribute in the Cedar schema: - -```cedarschema -entity Workflow { - allowed_tools: Set, - sensitivity_level: String -}; -``` - -This allows Cedar policies to reference `context.workflow_allowed_tools` as derived from the `Workflow` entity loaded at evaluation time. - -### Phase Boundaries - -| Phase | Behavior | -|-------|----------| -| Phase 1 | Static workflow policies committed in the Cedar bundle. The `workflow_id` is trusted as declared by the agent. | -| Phase 2 | Dynamic workflow attestation: the agent cryptographically declares its current workflow; the runtime verifies the declaration before evaluating workflow-scoped policies. | - ---- - -## Section 7 : Runtime as Sole MCP Endpoint (closes #39) - -### Agent Host Configuration - -The agent's MCP client is configured with exactly one MCP server URL: the runtime's URL. All upstream servers are invisible to the agent; the runtime handles routing internally. - -Example `claude_desktop_config.json`: - -```json -{ - "mcpServers": { - "cmcp-gateway": { - "url": "http://localhost:8443/mcp", - "transport": "http" - } - } -} -``` - -The agent never learns the upstream server URLs. From the agent's perspective, there is one MCP server. This prevents agents from bypassing the runtime by connecting directly to upstream servers. - -### Adding a New Upstream Server - -To add an upstream MCP server to the runtime catalog: - -1. Add a catalog entry to `catalog.json` in version control (see `tool-identity.md` for schema). -2. Recompute the policy bundle hash (the catalog hash is a separate field in the TRACE Claim: `tool_catalog.hash`). -3. Submit the change through the normal approval workflow. -4. Restart the enclave. The restart re-measures the catalog, producing a new `tool_catalog.hash` in subsequent TRACE Claims. - -The new server is not reachable until the enclave restarts with the updated catalog. There is no runtime path to add a server without measurement. - -### Emergency Access (Break-Glass) - -If an unauthorized server must be accessed urgently without an enclave restart, the runtime supports a break-glass mode. Break-glass adds the server to a temporary exception list for the current enclave session. - -Break-glass use is visible in the TRACE Claim: - -```json -"catalog_exceptions": [ - { - "server_identity": "spiffe://corp.example/emergency/server", - "reason": "P0 incident -- customer data export required", - "authorized_by": "ops-lead@example.com", - "timestamp": "2026-06-01T03:17:00Z" - } -] -``` - -TRACE Claims with a non-empty `catalog_exceptions` list are flagged for auditor review. Break-glass use appears in all TRACE Claims for the duration of that enclave session. - +# Cedar Policy Specification + +!!! warning "Draft" + Status: Draft v0.1 · Stability: Unstable: expect breaking changes before v1.0 + +This document specifies the Cedar policy bundle format, policy expression examples, enforcement modes, evaluation decision flow, and related governance features for the cMCP Runtime. + +--- + +## Section 1 : Policy Bundle Format + +A policy bundle is a directory (or tarball) with the following structure: + +``` +bundle/ + policies/ # Cedar policy files (.cedar), one per policy or logical group + schema.cedarschema # Cedar schema defining entity types, actions, and attributes + manifest.json # Provenance metadata +``` + +### manifest.json Format + +```json +{ + "version": "", + "authored_at": "", + "author_identity": "", + "commit_sha": "", + "approval_chain": [ + { + "approver": "", + "approved_at": "", + "signature": "" + } + ] +} +``` + +`approval_chain` is optional. When present, each entry is a signed approval by an authorized reviewer. See Section 5 for how approvals are verified. + +### Bundle Hash + +The bundle hash is the authoritative measurement committed into the attestation report. It is computed as: + +``` +SHA-256(canonical_json({ + "manifest": , + "policy_files": { + "": "" + // ... entries sorted lexicographically by filename + }, + "schema_hash": "" +})) +``` + +`canonical_json` means RFC 8785 (JSON Canonicalization Scheme): no insignificant whitespace, keys sorted lexicographically at every level. This ensures the hash is deterministic regardless of serialization order. + +This hash is what gets measured into the attestation report (see `policy_bundle.hash` in the TRACE Claim). Any modification to any policy file, the schema, or the manifest changes the hash, producing a measurement mismatch that verifiers can detect. + +--- + +## Section 2 : Cedar Policy Expression Examples + +The following examples show working Cedar policies for common enterprise use cases. All policies operate on the action `Action::"call_tool"`. + +### Tool Allowlist + +Permit calls only to named tools: + +```cedar +permit( + principal, + action == Action::"call_tool", + resource +) +when { + resource.tool_name in ["salesforce.query", "snowflake.read"] +}; +``` + +### Tool Denylist + +Explicitly forbid a specific tool regardless of other permits: + +```cedar +forbid( + principal, + action == Action::"call_tool", + resource +) +when { + resource.tool_name == "delete_customer_record" +}; +``` + +### Field-Level Redaction + +Permit the call but instruct the response inspector to redact sensitive fields: + +```cedar +permit( + principal, + action == Action::"call_tool", + resource +) +when { + resource.tool_name == "crm.get_customer" +} +advice { + redact_fields: ["ssn", "payment_history"] +}; +``` + +The `advice` block is evaluated by the response inspection pipeline after the upstream call returns. See `response-inspection.md` for redaction semantics. + +### Cross-System Compliance Boundary + +Forbid tool calls to uncovered servers when the session carries HIPAA PHI sensitivity: + +```cedar +forbid( + principal, + action == Action::"call_tool", + resource +) +when { + context.session_sensitivity == "hipaa_phi" && + resource.server_domain == "uncovered" +}; +``` + +### Per-Workflow Scope + +Permit tool calls only when the tool is in the workflow's allowed set: + +```cedar +permit( + principal, + action == Action::"call_tool", + resource +) +when { + context.workflow_id == "customer_onboarding" && + resource.tool_name in context.workflow_allowed_tools +}; +``` + +### Default-Deny Baseline + +Cedar is default-deny: a call is denied unless at least one `permit` matches and no `forbid` matches. To make this explicit and auditable, include a baseline forbid: + +```cedar +forbid( + principal, + action == Action::"call_tool", + resource +); +``` + +This ensures that even if the policy bundle is empty or all permits are removed, all calls are denied rather than silently allowed. + +--- + +## Section 3 : Enforcement Modes + +Enforcement mode is set in the deployment configuration, bound into the attestation report, and cannot change without an enclave restart. This makes the active mode tamper-evident. + +| Mode | Cedar deny behavior | Audit entry | +|------|--------------------|-| +| `enforcing` | Runtime rejects the call, returns a structured error to the agent | Logged with `decision=deny` | +| `advisory` | Runtime allows the call, forwards to upstream | Logged with `decision=deny_advisory` (would have been denied in enforcing mode) | +| `silent` | Runtime allows the call, forwards to upstream | Only a basic call log; no audit decision entry | + +**Structured error (enforcing mode):** + +```json +{ + "error": "tool_call_denied", + "tool_name": "", + "call_id": "", + "policy_bundle_version": "", + "message": "Tool call denied by runtime policy." +} +``` + +The error does not include the matched rule name or policy text, to avoid leaking policy internals to the agent. + +--- + +## Section 4 : Policy Evaluation Decision Flow + +``` +1. Receive MCP tool call request + inputs: tool_name, arguments, session_id, workflow_id (if present) + +2. Build Cedar evaluation context: + { + tool_name, + server_identity, + server_domain, + session_sensitivity, + workflow_id, + workflow_allowed_tools, + user_identity // if available + } + +3. Evaluate Cedar policies against: + (principal, Action::"call_tool", resource) with context + +4. If decision = permit: + proceed to egress DLP check (see egress policy documentation) + +5. If decision = deny: + enforce per enforcement_mode (see Section 3) + +6. Log audit entry: + { decision, rule_matched, latency_us, call_id } + +7. If decision = permit and DLP passes: + forward call to upstream server + +8. Receive response from upstream server + +9. Run response inspection pipeline (see response-inspection.md) + applies advice blocks (e.g., redact_fields) + +10. Log response audit entry + +11. Return (possibly redacted) response to agent +``` + +Latency budget: Cedar evaluation target is under 1 ms for bundles up to 500 policy rules. The runtime measures and logs `latency_us` for each evaluation to support SLA monitoring. + +--- + +## Section 5 : Policy Provenance (closes #26) + +The `manifest.json` provenance metadata is included in the bundle hash measurement (see Section 1). This means: + +- **Author identity** (`author_identity`): tamper-evident. Changing the identity changes the bundle hash, producing a measurement mismatch that verifiers detect. +- **Authoring timestamp** (`authored_at`): tamper-evident for the same reason. +- **Git commit** (`commit_sha`): tamper-evident. Links the policy bundle to a specific point in version control history. +- **Approval signatures** (`approval_chain`): tamper-evident. Each approval signature covers the bundle hash; removing or altering an approval changes the manifest, which changes the bundle hash. + +### Verifier Workflow + +A verifier checking a TRACE Claim can perform the following steps: + +1. Obtain the TRACE Claim and extract `policy_bundle.hash`. +2. Compare `policy_bundle.hash` against the approved hash on record (e.g., from the organization's policy registry). +3. Request the bundle tarball from the operator. +4. Recompute the bundle hash locally using the algorithm in Section 1 and verify it matches `policy_bundle.hash`. +5. Inspect `manifest.json` to confirm author, timestamp, commit SHA, and approval chain. +6. Optionally verify each approval signature against the approver's known public key. + +If any step fails, the verifier rejects the TRACE Claim. This process requires no trust in the operator: the TEE measurement is the root of trust. + +--- + +## Section 6 : Per-Workflow Cedar Policy Scope (closes #41) + +### Workflow Identity + +Workflow identity is established via session metadata. The agent includes a `workflow_id` in the session initiation request: + +- HTTP transport: `X-MCP-Workflow-ID` header +- Session configuration: `workflow_id` field in the session init payload + +If `workflow_id` is absent, the runtime defaults to `workflow_id = "default"`. The default workflow policy should be restrictive (allowlist only widely-approved tools). + +### Evaluation Order + +A tool call must pass both checks: + +1. **Catalog-level**: the tool is registered in the approved tool catalog. +2. **Workflow-level**: the tool is allowed for the current `workflow_id` per Cedar policy. + +Failing either check results in a deny decision. + +### Workflow Entity in Cedar Schema + +The per-workflow allowed-tools list can be modeled as an entity attribute in the Cedar schema: + +```cedarschema +entity Workflow { + allowed_tools: Set, + sensitivity_level: String +}; +``` + +This allows Cedar policies to reference `context.workflow_allowed_tools` as derived from the `Workflow` entity loaded at evaluation time. + +### Phase Boundaries + +| Phase | Behavior | +|-------|----------| +| Phase 1 | Static workflow policies committed in the Cedar bundle. The `workflow_id` is trusted as declared by the agent. | +| Phase 2 | Dynamic workflow attestation: the agent cryptographically declares its current workflow; the runtime verifies the declaration before evaluating workflow-scoped policies. | + +--- + +## Section 7 : Runtime as Sole MCP Endpoint (closes #39) + +### Agent Host Configuration + +The agent's MCP client is configured with exactly one MCP server URL: the runtime's URL. All upstream servers are invisible to the agent; the runtime handles routing internally. + +Example `claude_desktop_config.json`: + +```json +{ + "mcpServers": { + "cmcp-gateway": { + "url": "http://localhost:8443/mcp", + "transport": "http" + } + } +} +``` + +The agent never learns the upstream server URLs. From the agent's perspective, there is one MCP server. This prevents agents from bypassing the runtime by connecting directly to upstream servers. + +### Adding a New Upstream Server + +To add an upstream MCP server to the runtime catalog: + +1. Add a catalog entry to `catalog.json` in version control (see `tool-identity.md` for schema). +2. Recompute the policy bundle hash (the catalog hash is a separate field in the TRACE Claim: `tool_catalog.hash`). +3. Submit the change through the normal approval workflow. +4. Restart the enclave. The restart re-measures the catalog, producing a new `tool_catalog.hash` in subsequent TRACE Claims. + +The new server is not reachable until the enclave restarts with the updated catalog. There is no runtime path to add a server without measurement. + +### Emergency Access (Break-Glass) + +If an unauthorized server must be accessed urgently without an enclave restart, the runtime supports a break-glass mode. Break-glass adds the server to a temporary exception list for the current enclave session. + +Break-glass use is visible in the TRACE Claim: + +```json +"catalog_exceptions": [ + { + "server_identity": "spiffe://corp.example/emergency/server", + "reason": "P0 incident -- customer data export required", + "authorized_by": "ops-lead@example.com", + "timestamp": "2026-06-01T03:17:00Z" + } +] +``` + +TRACE Claims with a non-empty `catalog_exceptions` list are flagged for auditor review. Break-glass use appears in all TRACE Claims for the duration of that enclave session. + diff --git a/docs/spec/component-model.md b/docs/spec/component-model.md index d864161..16a842b 100644 --- a/docs/spec/component-model.md +++ b/docs/spec/component-model.md @@ -1,204 +1,204 @@ -# MCP Component Model and Trust Boundaries - -!!! warning "Draft" - Status: Draft v0.1 · Stability: Unstable — expect breaking changes before v1.0 - -Defines the full component model, trust levels per phase, and the hardware vs. software trust boundary. - -Closes #43. - ---- - -## Components - -### Agent Host / AI Application - -**Definition**: The container or process that hosts the agent and its MCP clients. Typically an enterprise application, a SaaS product, or a developer workstation running an AI assistant. - -**Owned by**: Enterprise deployer or application vendor. - -**Trust level**: Software-rooted. The agent host identity is established by TLS certificates or SPIFFE SVIDs provisioned at deploy time, not by hardware measurement. Its behavior is not isolated from the underlying OS. - -**Responsibilities**: -- Provisions MCP client(s) with the runtime endpoint. -- Holds the SPIFFE SVID (issued by SPIRE, conditioned on runtime attestation -- see transport spec). -- Does not connect directly to any MCP server. All MCP traffic goes through the runtime. - ---- - -### Agent (LLM + Control Loop) - -**Definition**: The model inference process plus the orchestration logic (tool selection, chain-of-thought, re-prompting). The agent decides which tools to call and constructs the input payloads. - -**Owned by**: Model provider (for hosted models) or enterprise (for self-hosted models). - -**Trust level**: Untrusted from the runtime's perspective. Tool choices and payloads are outputs of a probabilistic model, not deterministic code. The runtime assumes the agent may produce any tool call at any time, including calls that violate policy. The runtime enforces policy on every call regardless of agent intent. - -**Note**: The agent being "untrusted" does not mean it is assumed to be malicious. It means the gateway does not rely on the agent good behavior as a security control. - ---- - -### MCP Client - -**Definition**: The in-process library, embedded in the agent host, that speaks JSON-RPC 2.0 over HTTP/SSE (or stdio, subject to transport limitations). One client instance connects to one endpoint. - -**Owned by**: Agent host vendor or the open-source MCP SDK. - -**Trust level**: Software-rooted. The client is a library in the agent host process; it has the same trust level as the agent host. - -**Phase 1 configuration**: The MCP client is configured with a single endpoint -- the cMCP Runtime. It does not maintain connections to individual MCP servers. The runtime presents itself as a single MCP server to the client; internally it routes calls to the appropriate upstream MCP server. - ---- - -### cMCP Runtime - -**Definition**: The governance proxy. Every MCP tool call from the agent passes through the runtime. The runtime evaluates each call against a Cedar policy bundle, produces a TRACE Claim, and forwards allowed calls to the upstream MCP server. - -**Owned by**: Enterprise deployer (Phase 1) or SaaS vendor (Phase 2, provider-side). - -**Trust level**: Hardware-rooted. The runtime runs inside a TEE (TPM, SEV-SNP, TDX, or OPAQUE). Its identity is a SPIFFE SVID issued only after TEE attestation succeeds. Its signing key is sealed to the TEE and never exported. Its behavior is covered by the hardware measurement. - -**Responsibilities**: -- Terminates mTLS connections from agent hosts (verifying SPIFFE SVIDs). -- Evaluates Cedar policies for every tool call. -- Produces signed, hardware-attested TRACE Claims. -- Maintains the append-only audit chain. -- Forwards allowed calls to upstream MCP servers over a separate internal connection. - ---- - -### MCP Server - -**Definition**: The process that wraps a backend system and exposes it as MCP tools. May be first-party (built and operated by the enterprise), third-party (SaaS vendor MCP server), or local (stdio on the user machine, Phase 1 unsupported). - -**Owned by**: Varies. - -**Trust level**: -- **Phase 1**: Software-rooted. The MCP server runs outside the TEE. Its responses are received by the gateway but are not hardware-attested. The gateway trusts that the server returns what it claims to return, but this is not verifiable beyond TLS. -- **Phase 2**: Hardware-rooted. The SaaS vendor runs the MCP server inside its own TEE. The runtime can verify the server attestation report before routing calls to it. Both ends of the call are hardware-attested. - ---- - -### Backend Systems - -**Definition**: Databases, REST APIs, filesystems, and other systems that the MCP server wraps. Not MCP-aware. - -**Owned by**: Enterprise or SaaS vendor. - -**Trust level**: Varies. Backend systems are outside the scope of the MCP governance model. Their access controls are independent of cMCP. - ---- - -## Trust Boundary Diagrams - -### Phase 1: Runtime Inside TEE - -``` -+-------------------------------------------------------------+ -| Agent Host (software-rooted) | -| | -| +------------------+ +-----------------------------+ | -| | Agent | | MCP Client | | -| | (LLM + loop) |--->| (JSON-RPC 2.0 / HTTP+SSE) | | -| | [untrusted] | | [software-rooted] | | -| +------------------+ +---------------+-------------+ | -| | mTLS (SPIFFE) | -+------------------------------------------+------------------+ - | - #================+=================# - # TEE BOUNDARY (Phase 1) # - # # - # +-----------------------+ # - # | cMCP Runtime | # - # | [hardware-rooted] | # - # | | # - # | Cedar policy engine | # - # | TRACE Claim signer | # - # | Audit chain | # - # +-----------+-----------+ # - # | TLS # - #=============+===================# - | - +--------------+------------------+ - | MCP Server (software-rooted) | - | [Phase 1: outside TEE] | - +--------------+------------------+ - | - +--------------+------------------+ - | Backend System | - | (DB, API, filesystem) | - +---------------------------------+ - -Verification points: - [A] Agent-side: SPIFFE SVID confirms runtime identity before sending any call - [B] External auditor: verifies TRACE Claim signature against TEE public key - and checks attestation report against known-good measurement -``` - -### Phase 2: Server Also Inside TEE - -``` -+-------------------------------------------------------------+ -| Agent Host (software-rooted) | -| MCP Client ---- mTLS (SPIFFE SVID) ---------------------- | -+----------------------------------------------+--------------+ - | - #========================+================# - # TEE BOUNDARY -- Runtime # - # cMCP Runtime [hardware-rooted] # - #========================+================# - | mTLS (mutual SPIFFE SVIDs) - #========================+================# - # TEE BOUNDARY -- Server # - # MCP Server [hardware-rooted] # - #========================+================# - | - +------------------------+-----------------+ - | Backend System | - +-----------------------------------------+ - -Both ends of the call are hardware-attested. -The TRACE Claim can include the server attestation measurement. -``` - ---- - -## Hardware-Rooted vs. Software-Rooted per Component per Phase - -| Component | Phase 1 | Phase 2 | -|-----------|---------|---------| -| Agent Host | Software-rooted | Software-rooted | -| Agent (LLM + loop) | Untrusted | Untrusted | -| MCP Client | Software-rooted | Software-rooted | -| cMCP Runtime | **Hardware-rooted** (inside TEE) | **Hardware-rooted** (inside TEE) | -| MCP Server (first-party) | Software-rooted | **Hardware-rooted** (inside TEE) | -| MCP Server (third-party SaaS) | Software-rooted | **Hardware-rooted** (vendor TEE) | -| MCP Server (local/stdio) | Out of scope | Out of scope | -| Backend Systems | Varies | Varies | - ---- - -## Component Interaction Table - -| Caller | Callee | Protocol | Authentication Method | -|--------|--------|----------|-----------------------| -| Agent | MCP Client | In-process API | N/A (same process) | -| MCP Client | cMCP Runtime | JSON-RPC 2.0 over HTTP/SSE | mTLS with SPIFFE SVID | -| cMCP Runtime | MCP Server | JSON-RPC 2.0 over HTTP/SSE | mTLS with TLS client cert (Phase 1); mTLS with SPIFFE SVID (Phase 2) | -| MCP Server | Backend System | REST, SQL, gRPC, or other | Backend-native credentials (API key, IAM role, DB password) | -| External Auditor | TRACE Claim | Offline verification | TEE public key (from attestation report); no live connection to runtime required | -| SPIRE | cMCP Runtime | SPIFFE workload API | TEE attestation (node attestation plugin) | - ---- - -## Verification Points - -**Agent-side verification**: Before routing any tool call, the agent MCP client verifies the runtime TLS certificate against the expected SPIFFE SVID. This confirms the agent is talking to an attested runtime, not an impersonator. The SPIFFE SVID is the agent-side trust anchor. - -**External auditor verification**: The auditor receives TRACE Claims (out-of-band, from a log store or delivered by the enterprise). The auditor verifies: -1. The TRACE Claim signature against the TEE public key embedded in the claim. -2. The TEE public key against the attestation report (the key is bound to the TEE measurement). -3. The attestation report against the known-good measurement for the runtime version (obtained from the build pipeline or a public transparency log). -4. The policy bundle hash against the expected hash for the declared policy version. - -This verification requires no live connection to the runtime. It can be done weeks or months after the fact, satisfying P3.1 (regulatory proof requests) and P3.2 (customer pre-renewal questionnaires). +# MCP Component Model and Trust Boundaries + +!!! warning "Draft" + Status: Draft v0.1 · Stability: Unstable: expect breaking changes before v1.0 + +Defines the full component model, trust levels per phase, and the hardware vs. software trust boundary. + +Closes #43. + +--- + +## Components + +### Agent Host / AI Application + +**Definition**: The container or process that hosts the agent and its MCP clients. Typically an enterprise application, a SaaS product, or a developer workstation running an AI assistant. + +**Owned by**: Enterprise deployer or application vendor. + +**Trust level**: Software-rooted. The agent host identity is established by TLS certificates or SPIFFE SVIDs provisioned at deploy time, not by hardware measurement. Its behavior is not isolated from the underlying OS. + +**Responsibilities**: +- Provisions MCP client(s) with the runtime endpoint. +- Holds the SPIFFE SVID (issued by SPIRE, conditioned on runtime attestation -- see transport spec). +- Does not connect directly to any MCP server. All MCP traffic goes through the runtime. + +--- + +### Agent (LLM + Control Loop) + +**Definition**: The model inference process plus the orchestration logic (tool selection, chain-of-thought, re-prompting). The agent decides which tools to call and constructs the input payloads. + +**Owned by**: Model provider (for hosted models) or enterprise (for self-hosted models). + +**Trust level**: Untrusted from the runtime's perspective. Tool choices and payloads are outputs of a probabilistic model, not deterministic code. The runtime assumes the agent may produce any tool call at any time, including calls that violate policy. The runtime enforces policy on every call regardless of agent intent. + +**Note**: The agent being "untrusted" does not mean it is assumed to be malicious. It means the gateway does not rely on the agent good behavior as a security control. + +--- + +### MCP Client + +**Definition**: The in-process library, embedded in the agent host, that speaks JSON-RPC 2.0 over HTTP/SSE (or stdio, subject to transport limitations). One client instance connects to one endpoint. + +**Owned by**: Agent host vendor or the open-source MCP SDK. + +**Trust level**: Software-rooted. The client is a library in the agent host process; it has the same trust level as the agent host. + +**Phase 1 configuration**: The MCP client is configured with a single endpoint -- the cMCP Runtime. It does not maintain connections to individual MCP servers. The runtime presents itself as a single MCP server to the client; internally it routes calls to the appropriate upstream MCP server. + +--- + +### cMCP Runtime + +**Definition**: The governance proxy. Every MCP tool call from the agent passes through the runtime. The runtime evaluates each call against a Cedar policy bundle, produces a TRACE Claim, and forwards allowed calls to the upstream MCP server. + +**Owned by**: Enterprise deployer (Phase 1) or SaaS vendor (Phase 2, provider-side). + +**Trust level**: Hardware-rooted. The runtime runs inside a TEE (TPM, SEV-SNP, TDX, or OPAQUE). Its identity is a SPIFFE SVID issued only after TEE attestation succeeds. Its signing key is sealed to the TEE and never exported. Its behavior is covered by the hardware measurement. + +**Responsibilities**: +- Terminates mTLS connections from agent hosts (verifying SPIFFE SVIDs). +- Evaluates Cedar policies for every tool call. +- Produces signed, hardware-attested TRACE Claims. +- Maintains the append-only audit chain. +- Forwards allowed calls to upstream MCP servers over a separate internal connection. + +--- + +### MCP Server + +**Definition**: The process that wraps a backend system and exposes it as MCP tools. May be first-party (built and operated by the enterprise), third-party (SaaS vendor MCP server), or local (stdio on the user machine, Phase 1 unsupported). + +**Owned by**: Varies. + +**Trust level**: +- **Phase 1**: Software-rooted. The MCP server runs outside the TEE. Its responses are received by the gateway but are not hardware-attested. The gateway trusts that the server returns what it claims to return, but this is not verifiable beyond TLS. +- **Phase 2**: Hardware-rooted. The SaaS vendor runs the MCP server inside its own TEE. The runtime can verify the server attestation report before routing calls to it. Both ends of the call are hardware-attested. + +--- + +### Backend Systems + +**Definition**: Databases, REST APIs, filesystems, and other systems that the MCP server wraps. Not MCP-aware. + +**Owned by**: Enterprise or SaaS vendor. + +**Trust level**: Varies. Backend systems are outside the scope of the MCP governance model. Their access controls are independent of cMCP. + +--- + +## Trust Boundary Diagrams + +### Phase 1: Runtime Inside TEE + +``` ++-------------------------------------------------------------+ +| Agent Host (software-rooted) | +| | +| +------------------+ +-----------------------------+ | +| | Agent | | MCP Client | | +| | (LLM + loop) |--->| (JSON-RPC 2.0 / HTTP+SSE) | | +| | [untrusted] | | [software-rooted] | | +| +------------------+ +---------------+-------------+ | +| | mTLS (SPIFFE) | ++------------------------------------------+------------------+ + | + #================+=================# + # TEE BOUNDARY (Phase 1) # + # # + # +-----------------------+ # + # | cMCP Runtime | # + # | [hardware-rooted] | # + # | | # + # | Cedar policy engine | # + # | TRACE Claim signer | # + # | Audit chain | # + # +-----------+-----------+ # + # | TLS # + #=============+===================# + | + +--------------+------------------+ + | MCP Server (software-rooted) | + | [Phase 1: outside TEE] | + +--------------+------------------+ + | + +--------------+------------------+ + | Backend System | + | (DB, API, filesystem) | + +---------------------------------+ + +Verification points: + [A] Agent-side: SPIFFE SVID confirms runtime identity before sending any call + [B] External auditor: verifies TRACE Claim signature against TEE public key + and checks attestation report against known-good measurement +``` + +### Phase 2: Server Also Inside TEE + +``` ++-------------------------------------------------------------+ +| Agent Host (software-rooted) | +| MCP Client ---- mTLS (SPIFFE SVID) ---------------------- | ++----------------------------------------------+--------------+ + | + #========================+================# + # TEE BOUNDARY -- Runtime # + # cMCP Runtime [hardware-rooted] # + #========================+================# + | mTLS (mutual SPIFFE SVIDs) + #========================+================# + # TEE BOUNDARY -- Server # + # MCP Server [hardware-rooted] # + #========================+================# + | + +------------------------+-----------------+ + | Backend System | + +-----------------------------------------+ + +Both ends of the call are hardware-attested. +The TRACE Claim can include the server attestation measurement. +``` + +--- + +## Hardware-Rooted vs. Software-Rooted per Component per Phase + +| Component | Phase 1 | Phase 2 | +|-----------|---------|---------| +| Agent Host | Software-rooted | Software-rooted | +| Agent (LLM + loop) | Untrusted | Untrusted | +| MCP Client | Software-rooted | Software-rooted | +| cMCP Runtime | **Hardware-rooted** (inside TEE) | **Hardware-rooted** (inside TEE) | +| MCP Server (first-party) | Software-rooted | **Hardware-rooted** (inside TEE) | +| MCP Server (third-party SaaS) | Software-rooted | **Hardware-rooted** (vendor TEE) | +| MCP Server (local/stdio) | Out of scope | Out of scope | +| Backend Systems | Varies | Varies | + +--- + +## Component Interaction Table + +| Caller | Callee | Protocol | Authentication Method | +|--------|--------|----------|-----------------------| +| Agent | MCP Client | In-process API | N/A (same process) | +| MCP Client | cMCP Runtime | JSON-RPC 2.0 over HTTP/SSE | mTLS with SPIFFE SVID | +| cMCP Runtime | MCP Server | JSON-RPC 2.0 over HTTP/SSE | mTLS with TLS client cert (Phase 1); mTLS with SPIFFE SVID (Phase 2) | +| MCP Server | Backend System | REST, SQL, gRPC, or other | Backend-native credentials (API key, IAM role, DB password) | +| External Auditor | TRACE Claim | Offline verification | TEE public key (from attestation report); no live connection to runtime required | +| SPIRE | cMCP Runtime | SPIFFE workload API | TEE attestation (node attestation plugin) | + +--- + +## Verification Points + +**Agent-side verification**: Before routing any tool call, the agent MCP client verifies the runtime TLS certificate against the expected SPIFFE SVID. This confirms the agent is talking to an attested runtime, not an impersonator. The SPIFFE SVID is the agent-side trust anchor. + +**External auditor verification**: The auditor receives TRACE Claims (out-of-band, from a log store or delivered by the enterprise). The auditor verifies: +1. The TRACE Claim signature against the TEE public key embedded in the claim. +2. The TEE public key against the attestation report (the key is bound to the TEE measurement). +3. The attestation report against the known-good measurement for the runtime version (obtained from the build pipeline or a public transparency log). +4. The policy bundle hash against the expected hash for the declared policy version. + +This verification requires no live connection to the runtime. It can be done weeks or months after the fact, satisfying P3.1 (regulatory proof requests) and P3.2 (customer pre-renewal questionnaires). diff --git a/docs/spec/phase2-server.md b/docs/spec/phase2-server.md index 1669ed8..f0816e0 100644 --- a/docs/spec/phase2-server.md +++ b/docs/spec/phase2-server.md @@ -1,230 +1,230 @@ -# Phase 2 cMCP Server Specification - ---- -Status: Draft v0.1 -Last updated: 2026-06-04 -Stability: Unstable , expect breaking changes before v1.0 ---- - -## Section 1 : Phase 2 Architecture Overview - -Phase 2 targets a different deployer than Phase 1. The Phase 1 deployer is an agent developer who runs a runtime in front of their own agents. The Phase 2 deployer is a SaaS vendor or AI platform provider who exposes MCP endpoints to enterprise customers. Those enterprise customers — Phase 1 deployers — eventually ask: "prove your server code has not changed since I approved it." Phase 2 answers that question. - -Phase 1 closes from the agent side: the runtime attests what the agent sent and what policy was applied. Phase 2 closes from the server side: the MCP server binary, its tool surface, and its egress behavior are all measured inside a TEE and published as a second TRACE Claim that any enterprise verifier can check without trusting the SaaS operator. - -Phase 2 also closes two Phase 1 residuals: - -- **P1.4 Transitive trust**: Phase 1 attests that the gateway ran, but not that the server the gateway called is trustworthy. Phase 2 attests the server. -- **P4.1 Typosquatted packages / P4.2 tool-definition mutation**: Phase 2 measures the tool catalog at TEE startup; any post-startup mutation produces a verifiable mismatch. - -``` -Agent developer environment - | - v - Agent - | - v - MCP Client - | (verify attestation) - v -SaaS / Platform Provider - +-----------------------------------------------+ - | OPAQUE TEE | - | +------------------------------------------+ | - | | Provider MCP Server | | - | | (binary measured at startup) | | - | +------------------------------------------+ | - | Attestation + trust artifact | - +-----------------------------------------------+ - | - v - Provider backend - (DB / APIs / customer data) -``` - -The combined trust artifact delivered to the verifier is a pair of TRACE Claims: the Phase 1 runtime claim (proves runtime policy ran) and the Phase 2 server claim (proves server binary and tool surface are attested). Neither claim requires trusting the other party's operator. - -**Sequencing note.** Phase 2 is the natural pull from Phase 1 adoption. It is not the current build focus. Revisit after Phase 1 GA and early production feedback. - ---- - -## Section 2 : Five Unique Attestable Properties - -Each property below is something a TEE measurement can prove that a software-only signature or audit log cannot. - -### Property 1: Server Runtime Hardware-Measured - -**Definition.** The binary running right now is the binary attested - not just signed at some earlier moment. - -**What is measured.** At TEE startup, the container image digest of the MCP server binary is measured into the attestation report. This is the same mechanism Phase 1 uses for the runtime, now applied to the server. - -**Attestation field.** `server_attestation.container_image_digest` - -**Verification.** The verifier computes the expected hash of the approved server image (from the customer's approved build artifact) and compares it against `server_attestation.container_image_digest` in the server's TRACE Claim. A match means the binary in memory at runtime matches the approved build. - -**Why software cannot substitute.** A compromised maintainer who reissues a valid code-signing certificate can reissue a valid signature for malicious code. The hardware measurement is taken at runtime - the binary in memory is measured, not a signature from build time. The TEE measurement cannot be forged after the fact without invalidating the attestation report. - ---- - -### Property 2: Server Tool Surface Measured at Startup - -**Definition.** The server cannot expose a tool whose definition differs from the measurement taken at startup. - -**What is measured.** At TEE startup, the server's tool catalog - all tool names, descriptions, and input schemas - is hashed and the hash is measured into the attestation report. - -**Attestation field.** `server_attestation.tool_catalog_hash` - -**Verification.** The verifier computes the expected tool catalog hash from the customer-approved tool definitions (from the vendor's security review artifacts) and compares it against `server_attestation.tool_catalog_hash`. Any rug-pull via `notifications/tools/list_changed` that alters a tool description or schema after startup produces a mismatch detectable on the next verification cycle. - -**What this closes.** P4.2 tool-definition mutation. A server that changes its tool descriptions after approval to manipulate agent behavior will produce a tool catalog hash that does not match the approved value. - ---- - -### Property 3: Server Egress Profile Attested - -**Definition.** The server's own downstream API calls are within its declared scope. The dependency chain is measurable end-to-end. - -**What is measured.** The server's egress policy - an allowlist of upstream APIs the server is permitted to call - is hashed and measured at TEE startup. - -**Attestation field.** `server_attestation.egress_policy_hash` - -**Verification.** The verifier checks the egress policy hash against the approved policy on record. An enterprise can verify that the MCP server cannot call an unapproved upstream service (for example, an external model API that was not in scope when the server was approved) because any deviation from the measured egress policy is detectable. - -**What this closes.** P1.4 transitive trust. Phase 1 attests that the runtime ran the approved policy against the agent's call. Phase 1 cannot attest what the server did next. Phase 2 closes this: the server's own upstream dependencies are part of the attested measurement, and a verifier can confirm the server's transitive call graph was bounded at startup. - ---- - -### Property 4: Multi-Tenant Isolation Hardware-Provable - -**Definition.** SaaS providers can demonstrate that tenant data boundaries were enforced in hardware, not just configured in software. - -**What is measured.** Each tenant's data path either runs in a separate TEE or in a shared TEE with hardware-enforced memory partitioning. The tenant isolation configuration is measured at startup. - -**Attestation field.** `server_attestation.tenant_isolation_mode` - -Valid values: - -| Value | Meaning | -|---|---| -| `separate_tee` | Each tenant runs in a dedicated enclave. Full hardware isolation. | -| `shared_tee_hw_partitioned` | One enclave, hardware-enforced memory partitioning between tenants. | -| `shared_tee_sw_only` | One enclave, software-only isolation. Not hardware-attested. | - -Software-only isolation (`shared_tee_sw_only`) must be labeled explicitly. Verifiers that require hardware-enforced isolation must reject claims with this value. - -**Verification.** The verifier checks that `tenant_isolation_mode` is one of the hardware-enforced options (`separate_tee` or `shared_tee_hw_partitioned`) and that the customer's tenant ID appears in the server's attested tenant registry. - ---- - -### Property 5: Cross-Organizational Attestation Chains - -**Definition.** Party A (an agent from enterprise A) can verify party B's (a SaaS vendor's) MCP server directly - without a shared operator in the chain and without trusting either party's infrastructure claims. - -**Implementation.** The Phase 1 runtime TRACE Claim includes a `server_trace_claim_ref` field pointing to the server's Phase 2 TRACE Claim. The agent (or its runtime) performs two independent verifications: - -1. Phase 1 gateway TRACE Claim : proves the gateway's Cedar policy ran and was hardware-attested. -2. Phase 2 server TRACE Claim : proves the server binary and tool surface are attested. - -Neither verification requires trusting the other party's operator. Both claims are signed with TEE-sealed keys. The attestation reports are verifiable against the TEE provider's public endorsement chain (AMD ARK/ASK/VCEK for SEV-SNP, Intel PCS for TDX, TPM endorsement certificates for vTPM). - -**Combined trust artifact format** (delivered to a cross-org verifier): - -```json -{ - "phase1_runtime_claim": "", - "phase2_server_claim_url": "https://attestation.vendor.com/claims/", - "phase2_server_claim": "" -} -``` - -The `phase2_server_claim_url` is the canonical reference. The `phase2_server_claim` inline copy is provided for offline verification and archival. The verifier should use the URL to check for revocation and to fetch the latest measurement for the session. - ---- - -## Section 3 : Phase 2 Proxy Architecture for Streaming - -The Phase 2 proxy adds payload inspection - content classification and per-field policy evaluation - between the agent and the tool. MCP is moving toward streaming tool responses (chunked HTTP / server-sent events), not just request-response. The proxy architecture must handle both. - -### Classification Pipeline Placement - -Three placement options: - -**Inline (synchronous).** The runtime buffers the entire response before classifying. Latency penalty: `response_size / classification_throughput`. Not viable for streaming - buffering defeats the purpose of streaming and introduces unbounded latency for large payloads. - -**Async (default).** The runtime begins streaming the response to the agent immediately. Classification runs concurrently on the streamed chunks. If classification detects a violation partway through: - -1. Send a control message to the agent signaling that the in-progress response is being terminated. -2. Close the streaming connection to the agent. -3. Log the partial response hash in the audit chain. - -The agent must handle mid-stream termination. This is a protocol contract, not a best-effort behavior. - -**Hybrid (configurable).** For high-sensitivity tools (as tagged in the policy bundle), buffer and classify before streaming. For low-sensitivity tools, use async. The policy bundle can annotate each tool with `inspection_mode: buffered | async`. - -### Partial-Response Denial Protocol - -When the proxy must terminate a streaming response mid-stream, it sends the following MCP event before closing the connection: - -```json -{ - "type": "stream_terminated", - "reason": "inspection_violation", - "call_id": "", - "data_transmitted_bytes": -} -``` - -Agents that consume Phase 2 proxied streams must handle `stream_terminated` and treat any partial response as invalid. A partial response that is acted on without checking for `stream_terminated` is a client-side protocol violation. - -### Backpressure - -If the classification engine is slower than the stream rate, the proxy applies TCP-level backpressure: it stops reading from the upstream (server) connection until classification catches up. This creates natural flow control without dropping data. - -Maximum bytes buffered before forced termination: configurable per tool, default 1 MB. When the buffer limit is reached and classification has not completed, the proxy terminates the stream and logs the event as `buffer_limit_exceeded`. - ---- - -## Section 4 : Phase 2 Proxy Reliability Targets - -These targets apply to the proxy classification path under nominal load with a representative policy suite. They are design targets for the Phase 2 implementation milestone; final values will be validated against benchmark results from early production deployments. - -| Metric | Target | Notes | -|---|---|---| -| False rejection rate | < 0.05% | 1 in 2000 legitimate tool calls incorrectly denied | -| False acceptance rate | < 0.1% | < 0.1% of calls that would be denied by a ground-truth oracle are allowed through; this is a security target | -| Proxy-induced failure rate | < 0.01% | Infrastructure failures causing calls to fail that would have succeeded without the proxy | -| Latency - pattern-based classification | p99 < 10 ms | Cedar policy evaluation + regex/pattern classification | -| Latency - model-based classification | p99 < 100 ms | External classifier inference call included | -| Latency - end-to-end proxy path | p99 < 15 ms | Cedar eval + pattern classification; model-based is separate | - -**Fail behavior on proxy unavailability.** Default: fail-closed (deny all calls). This is the safe default for production. Configurable to fail-open for development environments via explicit flag (`enforcement_mode: fail_open_on_proxy_unavailable`). Fail-open must be blocked in production policy bundles. - -**False acceptance rate measurement.** This target is harder to measure objectively than false rejection rate. The proxy team will maintain a labeled test corpus of policy-violating payloads, run it against the proxy in a test environment, and report the escape rate. The 0.1% target is a cap on the escape rate against that corpus. - ---- - -## Section 5 : Multi-Tenant Isolation Model - -### Phase 1 Stance - -Phase 1 is single-tenant by design. One runtime instance = one policy bundle = one audit chain = one customer or business unit. If multiple business units deploy the same runtime binary with different policy bundles, each instance is treated as a separate single-tenant deployment. Multi-tenancy is out of Phase 1 scope. - -### Phase 2 Options for Multi-Tenant MCP Servers - -SaaS providers running MCP servers typically serve multiple enterprise customers from shared infrastructure. Phase 2 supports two isolation options. - -**Option A - Separate TEE per tenant.** Each customer gets a dedicated enclave. The customer's policy bundle, audit chain, and TRACE Claims are fully isolated at the hardware level. No shared memory, no shared scheduler, no shared key material. - -- Assurance level: highest. -- Cost: one TEE instance per customer. Not economical at scale for small customers. -- Recommended for: highest-assurance deployments, regulated industries, customers who require hardware isolation as a contractual commitment. - -**Option B - Shared TEE with tenant namespacing.** One enclave serves multiple tenants. The policy bundle includes per-tenant policy namespacing: `tenant_id` is a Cedar entity with its own policy set. Audit chains are tagged by `tenant_id`. TRACE Claims are scoped to a `tenant_id`. Isolation is policy-enforced within the TEE. - -- Assurance level: high (TEE boundary is shared, but policy enforcement is hardware-attested within it). -- Cost: one TEE instance per server deployment, amortized across tenants. -- Recommended for: standard SaaS deployments where hardware-per-tenant is not economical. -- Limitation: isolation is not hardware-isolated between tenants at the memory level. The `tenant_isolation_mode` field in the attestation report must be set to `shared_tee_hw_partitioned` or `shared_tee_sw_only` to communicate this accurately to verifiers. - -Both options are supported in Phase 2. The `server_attestation.tenant_isolation_mode` field tells verifiers which option is in use. Enterprise customers whose compliance requirements mandate hardware-isolated tenancy should verify that the value is `separate_tee` before accepting the server's TRACE Claim. - +# Phase 2 cMCP Server Specification + +--- +Status: Draft v0.1 +Last updated: 2026-06-04 +Stability: Unstable , expect breaking changes before v1.0 +--- + +## Section 1 : Phase 2 Architecture Overview + +Phase 2 targets a different deployer than Phase 1. The Phase 1 deployer is an agent developer who runs a runtime in front of their own agents. The Phase 2 deployer is a SaaS vendor or AI platform provider who exposes MCP endpoints to enterprise customers. Those enterprise customers: Phase 1 deployers: eventually ask: "prove your server code has not changed since I approved it." Phase 2 answers that question. + +Phase 1 closes from the agent side: the runtime attests what the agent sent and what policy was applied. Phase 2 closes from the server side: the MCP server binary, its tool surface, and its egress behavior are all measured inside a TEE and published as a second TRACE Claim that any enterprise verifier can check without trusting the SaaS operator. + +Phase 2 also closes two Phase 1 residuals: + +- **P1.4 Transitive trust**: Phase 1 attests that the gateway ran, but not that the server the gateway called is trustworthy. Phase 2 attests the server. +- **P4.1 Typosquatted packages / P4.2 tool-definition mutation**: Phase 2 measures the tool catalog at TEE startup; any post-startup mutation produces a verifiable mismatch. + +``` +Agent developer environment + | + v + Agent + | + v + MCP Client + | (verify attestation) + v +SaaS / Platform Provider + +-----------------------------------------------+ + | OPAQUE TEE | + | +------------------------------------------+ | + | | Provider MCP Server | | + | | (binary measured at startup) | | + | +------------------------------------------+ | + | Attestation + trust artifact | + +-----------------------------------------------+ + | + v + Provider backend + (DB / APIs / customer data) +``` + +The combined trust artifact delivered to the verifier is a pair of TRACE Claims: the Phase 1 runtime claim (proves runtime policy ran) and the Phase 2 server claim (proves server binary and tool surface are attested). Neither claim requires trusting the other party's operator. + +**Sequencing note.** Phase 2 is the natural pull from Phase 1 adoption. It is not the current build focus. Revisit after Phase 1 GA and early production feedback. + +--- + +## Section 2 : Five Unique Attestable Properties + +Each property below is something a TEE measurement can prove that a software-only signature or audit log cannot. + +### Property 1: Server Runtime Hardware-Measured + +**Definition.** The binary running right now is the binary attested - not just signed at some earlier moment. + +**What is measured.** At TEE startup, the container image digest of the MCP server binary is measured into the attestation report. This is the same mechanism Phase 1 uses for the runtime, now applied to the server. + +**Attestation field.** `server_attestation.container_image_digest` + +**Verification.** The verifier computes the expected hash of the approved server image (from the customer's approved build artifact) and compares it against `server_attestation.container_image_digest` in the server's TRACE Claim. A match means the binary in memory at runtime matches the approved build. + +**Why software cannot substitute.** A compromised maintainer who reissues a valid code-signing certificate can reissue a valid signature for malicious code. The hardware measurement is taken at runtime - the binary in memory is measured, not a signature from build time. The TEE measurement cannot be forged after the fact without invalidating the attestation report. + +--- + +### Property 2: Server Tool Surface Measured at Startup + +**Definition.** The server cannot expose a tool whose definition differs from the measurement taken at startup. + +**What is measured.** At TEE startup, the server's tool catalog - all tool names, descriptions, and input schemas - is hashed and the hash is measured into the attestation report. + +**Attestation field.** `server_attestation.tool_catalog_hash` + +**Verification.** The verifier computes the expected tool catalog hash from the customer-approved tool definitions (from the vendor's security review artifacts) and compares it against `server_attestation.tool_catalog_hash`. Any rug-pull via `notifications/tools/list_changed` that alters a tool description or schema after startup produces a mismatch detectable on the next verification cycle. + +**What this closes.** P4.2 tool-definition mutation. A server that changes its tool descriptions after approval to manipulate agent behavior will produce a tool catalog hash that does not match the approved value. + +--- + +### Property 3: Server Egress Profile Attested + +**Definition.** The server's own downstream API calls are within its declared scope. The dependency chain is measurable end-to-end. + +**What is measured.** The server's egress policy - an allowlist of upstream APIs the server is permitted to call - is hashed and measured at TEE startup. + +**Attestation field.** `server_attestation.egress_policy_hash` + +**Verification.** The verifier checks the egress policy hash against the approved policy on record. An enterprise can verify that the MCP server cannot call an unapproved upstream service (for example, an external model API that was not in scope when the server was approved) because any deviation from the measured egress policy is detectable. + +**What this closes.** P1.4 transitive trust. Phase 1 attests that the runtime ran the approved policy against the agent's call. Phase 1 cannot attest what the server did next. Phase 2 closes this: the server's own upstream dependencies are part of the attested measurement, and a verifier can confirm the server's transitive call graph was bounded at startup. + +--- + +### Property 4: Multi-Tenant Isolation Hardware-Provable + +**Definition.** SaaS providers can demonstrate that tenant data boundaries were enforced in hardware, not just configured in software. + +**What is measured.** Each tenant's data path either runs in a separate TEE or in a shared TEE with hardware-enforced memory partitioning. The tenant isolation configuration is measured at startup. + +**Attestation field.** `server_attestation.tenant_isolation_mode` + +Valid values: + +| Value | Meaning | +|---|---| +| `separate_tee` | Each tenant runs in a dedicated enclave. Full hardware isolation. | +| `shared_tee_hw_partitioned` | One enclave, hardware-enforced memory partitioning between tenants. | +| `shared_tee_sw_only` | One enclave, software-only isolation. Not hardware-attested. | + +Software-only isolation (`shared_tee_sw_only`) must be labeled explicitly. Verifiers that require hardware-enforced isolation must reject claims with this value. + +**Verification.** The verifier checks that `tenant_isolation_mode` is one of the hardware-enforced options (`separate_tee` or `shared_tee_hw_partitioned`) and that the customer's tenant ID appears in the server's attested tenant registry. + +--- + +### Property 5: Cross-Organizational Attestation Chains + +**Definition.** Party A (an agent from enterprise A) can verify party B's (a SaaS vendor's) MCP server directly - without a shared operator in the chain and without trusting either party's infrastructure claims. + +**Implementation.** The Phase 1 runtime TRACE Claim includes a `server_trace_claim_ref` field pointing to the server's Phase 2 TRACE Claim. The agent (or its runtime) performs two independent verifications: + +1. Phase 1 gateway TRACE Claim : proves the gateway's Cedar policy ran and was hardware-attested. +2. Phase 2 server TRACE Claim : proves the server binary and tool surface are attested. + +Neither verification requires trusting the other party's operator. Both claims are signed with TEE-sealed keys. The attestation reports are verifiable against the TEE provider's public endorsement chain (AMD ARK/ASK/VCEK for SEV-SNP, Intel PCS for TDX, TPM endorsement certificates for vTPM). + +**Combined trust artifact format** (delivered to a cross-org verifier): + +```json +{ + "phase1_runtime_claim": "", + "phase2_server_claim_url": "https://attestation.vendor.com/claims/", + "phase2_server_claim": "" +} +``` + +The `phase2_server_claim_url` is the canonical reference. The `phase2_server_claim` inline copy is provided for offline verification and archival. The verifier should use the URL to check for revocation and to fetch the latest measurement for the session. + +--- + +## Section 3 : Phase 2 Proxy Architecture for Streaming + +The Phase 2 proxy adds payload inspection - content classification and per-field policy evaluation - between the agent and the tool. MCP is moving toward streaming tool responses (chunked HTTP / server-sent events), not just request-response. The proxy architecture must handle both. + +### Classification Pipeline Placement + +Three placement options: + +**Inline (synchronous).** The runtime buffers the entire response before classifying. Latency penalty: `response_size / classification_throughput`. Not viable for streaming - buffering defeats the purpose of streaming and introduces unbounded latency for large payloads. + +**Async (default).** The runtime begins streaming the response to the agent immediately. Classification runs concurrently on the streamed chunks. If classification detects a violation partway through: + +1. Send a control message to the agent signaling that the in-progress response is being terminated. +2. Close the streaming connection to the agent. +3. Log the partial response hash in the audit chain. + +The agent must handle mid-stream termination. This is a protocol contract, not a best-effort behavior. + +**Hybrid (configurable).** For high-sensitivity tools (as tagged in the policy bundle), buffer and classify before streaming. For low-sensitivity tools, use async. The policy bundle can annotate each tool with `inspection_mode: buffered | async`. + +### Partial-Response Denial Protocol + +When the proxy must terminate a streaming response mid-stream, it sends the following MCP event before closing the connection: + +```json +{ + "type": "stream_terminated", + "reason": "inspection_violation", + "call_id": "", + "data_transmitted_bytes": +} +``` + +Agents that consume Phase 2 proxied streams must handle `stream_terminated` and treat any partial response as invalid. A partial response that is acted on without checking for `stream_terminated` is a client-side protocol violation. + +### Backpressure + +If the classification engine is slower than the stream rate, the proxy applies TCP-level backpressure: it stops reading from the upstream (server) connection until classification catches up. This creates natural flow control without dropping data. + +Maximum bytes buffered before forced termination: configurable per tool, default 1 MB. When the buffer limit is reached and classification has not completed, the proxy terminates the stream and logs the event as `buffer_limit_exceeded`. + +--- + +## Section 4 : Phase 2 Proxy Reliability Targets + +These targets apply to the proxy classification path under nominal load with a representative policy suite. They are design targets for the Phase 2 implementation milestone; final values will be validated against benchmark results from early production deployments. + +| Metric | Target | Notes | +|---|---|---| +| False rejection rate | < 0.05% | 1 in 2000 legitimate tool calls incorrectly denied | +| False acceptance rate | < 0.1% | < 0.1% of calls that would be denied by a ground-truth oracle are allowed through; this is a security target | +| Proxy-induced failure rate | < 0.01% | Infrastructure failures causing calls to fail that would have succeeded without the proxy | +| Latency - pattern-based classification | p99 < 10 ms | Cedar policy evaluation + regex/pattern classification | +| Latency - model-based classification | p99 < 100 ms | External classifier inference call included | +| Latency - end-to-end proxy path | p99 < 15 ms | Cedar eval + pattern classification; model-based is separate | + +**Fail behavior on proxy unavailability.** Default: fail-closed (deny all calls). This is the safe default for production. Configurable to fail-open for development environments via explicit flag (`enforcement_mode: fail_open_on_proxy_unavailable`). Fail-open must be blocked in production policy bundles. + +**False acceptance rate measurement.** This target is harder to measure objectively than false rejection rate. The proxy team will maintain a labeled test corpus of policy-violating payloads, run it against the proxy in a test environment, and report the escape rate. The 0.1% target is a cap on the escape rate against that corpus. + +--- + +## Section 5 : Multi-Tenant Isolation Model + +### Phase 1 Stance + +Phase 1 is single-tenant by design. One runtime instance = one policy bundle = one audit chain = one customer or business unit. If multiple business units deploy the same runtime binary with different policy bundles, each instance is treated as a separate single-tenant deployment. Multi-tenancy is out of Phase 1 scope. + +### Phase 2 Options for Multi-Tenant MCP Servers + +SaaS providers running MCP servers typically serve multiple enterprise customers from shared infrastructure. Phase 2 supports two isolation options. + +**Option A - Separate TEE per tenant.** Each customer gets a dedicated enclave. The customer's policy bundle, audit chain, and TRACE Claims are fully isolated at the hardware level. No shared memory, no shared scheduler, no shared key material. + +- Assurance level: highest. +- Cost: one TEE instance per customer. Not economical at scale for small customers. +- Recommended for: highest-assurance deployments, regulated industries, customers who require hardware isolation as a contractual commitment. + +**Option B - Shared TEE with tenant namespacing.** One enclave serves multiple tenants. The policy bundle includes per-tenant policy namespacing: `tenant_id` is a Cedar entity with its own policy set. Audit chains are tagged by `tenant_id`. TRACE Claims are scoped to a `tenant_id`. Isolation is policy-enforced within the TEE. + +- Assurance level: high (TEE boundary is shared, but policy enforcement is hardware-attested within it). +- Cost: one TEE instance per server deployment, amortized across tenants. +- Recommended for: standard SaaS deployments where hardware-per-tenant is not economical. +- Limitation: isolation is not hardware-isolated between tenants at the memory level. The `tenant_isolation_mode` field in the attestation report must be set to `shared_tee_hw_partitioned` or `shared_tee_sw_only` to communicate this accurately to verifiers. + +Both options are supported in Phase 2. The `server_attestation.tenant_isolation_mode` field tells verifiers which option is in use. Enterprise customers whose compliance requirements mandate hardware-isolated tenancy should verify that the value is `separate_tee` before accepting the server's TRACE Claim. + diff --git a/docs/spec/verification-library.md b/docs/spec/verification-library.md index 7251bf0..19792d4 100644 --- a/docs/spec/verification-library.md +++ b/docs/spec/verification-library.md @@ -1,198 +1,198 @@ -# cmcp-verify: Verification Library Interface Spec - -!!! warning "Draft" - Status: Draft v0.1 · Stability: Unstable — expect breaking changes before v1.0 - -This document is the interface specification for the `cmcp-verify` Python library. Implementation is separate from this spec. All type stubs below define the public interface that the implementation must satisfy. - -## Type Stubs - -```python -from dataclasses import dataclass -from enum import Enum -from typing import Optional -import datetime - -class TEEProvider(Enum): - TPM = "tpm" - SEV_SNP = "sev-snp" - TDX = "tdx" - OPAQUE = "opaque" - SOFTWARE_ONLY = "software-only" - -class VerificationStatus(Enum): - VERIFIED = "verified" - UNVERIFIED = "unverified" - PARTIALLY_VERIFIED = "partially_verified" - -@dataclass -class VerificationResult: - status: VerificationStatus - verified_fields: list[str] # fields successfully verified - unverified_fields: list[str] # fields present but not verified (provider not supported, etc.) - failure_reason: Optional[str] # None if status != UNVERIFIED - attestation_age_seconds: int # how old the attestation is - is_attestation_fresh: bool # attestation_age_seconds < configured validity window - -@dataclass -class ApprovedHashes: - policy_bundle_hash: str # sha256 hex string of approved policy bundle - tool_catalog_hash: str # sha256 hex string of approved tool catalog - -def verify_trace_claim( - claim_json: dict, - approved: ApprovedHashes, - max_attestation_age_seconds: int = 86400, - *, - trusted_public_key_hex: Optional[str] = None, - agent_manifest: Optional[dict] = None, - trusted_agent_manifest_keys: Optional[dict[str, bytes]] = None, -) -> VerificationResult: - """ - Verify a TRACE Claim without trusting the operator. - - Steps: - 1. Verify tee_public_key is bound to attestation_report (provider-specific) - 2. Verify signature over canonical claim body using tee_public_key - 3. Check policy_bundle.hash against approved.policy_bundle_hash - 4. Check tool_catalog.hash against approved.tool_catalog_hash - 5. If agent_manifest and trusted_agent_manifest_keys are provided, verify - the Agent Manifest issuer signature with agent-manifest SDK - verify_manifest() and cross-check gateway.agent_identity: - manifest_id, agent_id/authenticated_subject, subject_source, policy hash, - catalog hash, and manifest expiry. - 6. Check attestation freshness (timestamp within max_attestation_age_seconds) - 7. Verify audit chain continuity (audit_chain_root, audit_chain_tip) - - Returns VerificationResult with status and details. - """ - ... -``` - -`trusted_agent_manifest_keys` keeps cMCP's runtime-facing shape as raw Ed25519 -public key bytes keyed by issuer `key_id`; the verifier base64url-encodes those -keys when calling the Agent Manifest SDK. - -### Audit Bundle Verification and External Execution Evidence - -```python -@dataclass -class AuditBundleResult: - verified: bool - entry_count: int - failures: list[str] - -def verify_audit_bundle( - bundle_json: dict, - claim_json: Optional[dict] = None, - *, - external_evidence_keys: Optional[dict[str, bytes]] = None, -) -> AuditBundleResult: - """ - Verify an exported audit bundle. When external_evidence_keys is supplied, - each key is issuer_key_id -> raw 32-byte Ed25519 public key. issuer_key_id - is lowercase hex SHA-256(public_key_bytes). - """ - ... -``` - -`external_execution_evidence.evidence_hash` is the digest of the detached evidence payload attested by the issuer, not the digest of the receipt envelope. For JSON evidence payloads, the hash pre-image is the UTF-8 bytes of the RFC 8785/JCS canonical JSON representation. For non-JSON evidence payloads, the pre-image is the exact byte string identified by the issuer's evidence format. The field value is `sha256:` or `sha384:`. - -Runtime ingestion convention: when an allowed upstream tool response is a JSON object with a top-level `external_execution_evidence` object matching the audit schema, cMCP copies that receipt into the `tool_call` audit entry. The response itself is not rewritten; `response_payload_hash` still covers the bytes returned to the caller. - -The verifier computes the receipt signing input as canonical JSON over the receipt object excluding `signature`, with sorted keys and compact separators. It then checks: - -1. `linked_call_id` equals the audit entry `call_id`. -2. `issuer_key_id` is lowercase hex SHA-256 of the trusted issuer public key. -3. `evidence_hash` has a supported hash prefix and hex digest. -4. `evidence_type` is one of the documented receipt types. -5. The Ed25519 signature verifies over the canonical receipt signing input. - -If any external evidence check fails, the audit bundle result is `verified=False` and the failure string includes `EXTERNAL_EVIDENCE_VERIFICATION_FAILED`. - -## Per-Provider Verification Steps - -### TPM Verification - -1. Obtain the TPM Endorsement Key (EK) certificate from the TPM manufacturer (e.g., fetched from the TPM itself via Esys_ReadPublic or from the manufacturer's certificate authority at ek.{manufacturer}.com). -2. Verify the EK certificate chains to a trusted manufacturer CA (TPM manufacturer CA roots are published by Microsoft, Amazon, Google for their vTPM implementations). -3. Extract the TPM2B_ATTEST structure from attestation_report.raw_evidence. -4. Verify the TPM2_Quote signature using the Attestation Key (AK) public key, which must be certified by the EK. -5. Confirm the quote's qualifying_data matches SHA-256(tee_public_key || session_id) from the TRACE Claim -- this binds the quote to the specific runtime instance. -6. Confirm the PCR values in the quote match attestation_report.measurement (compare byte-by-byte). -7. If all checks pass: TEE identity is verified for TPM. - -### SEV-SNP Verification - -1. Fetch the AMD VCEK (Versioned Chip Endorsement Key) certificate for the specific CPU. VCEK fetch URL format: https://kdsintf.amd.com/vcek/v1/{product}/{hwid}?{tcb_params}. Product is "Milan" or "Genoa". hwid and tcb_params come from the SNP attestation report. -2. Verify VCEK certificate chains to AMD Root CA (download from https://kdsintf.amd.com/vcek/v1/Milan/cert_chain). -3. Parse the SNP attestation report from attestation_report.raw_evidence (binary format per AMD SEV-SNP Firmware ABI Specification, Table 22). -4. Verify the report signature using the VCEK public key. -5. Confirm report.REPORT_DATA == SHA-256(tee_public_key || session_id) (bytes 0-31 of REPORT_DATA). -6. Confirm report.MEASUREMENT == bytes decoded from attestation_report.measurement (the 48-byte launch measurement). -7. Confirm report.POLICY fields match expected configuration (no debug mode, SMT policy as expected). -8. If all checks pass: TEE identity is verified for SEV-SNP. - -### Intel TDX Verification - -1. Fetch TDX Quote Collateral using Intel's DCAP (Data Center Attestation Primitives) API at https://api.trustedservices.intel.com/tdx/certification/v4/qe/identity. -2. Parse the TDX Quote from attestation_report.raw_evidence (follows Intel TDX Quote Generation Service format). -3. Verify the Quote using the QE (Quoting Enclave) identity and PCK (Provisioning Certification Key) certificate chain from the collateral. -4. Confirm TD_REPORT.REPORT_DATA == SHA-256(tee_public_key || session_id). -5. Confirm TD_REPORT.MRTD || RTMR0 || RTMR1 || RTMR2 || RTMR3 == attestation_report.measurement (concatenated). -6. If all checks pass: TEE identity is verified for TDX. - -### OPAQUE Managed Verification - -1. Call the OPAQUE attestation verification endpoint (provided at deployment time) with the attestation_report.raw_evidence as the request body. -2. The endpoint returns: {verified: true|false, measurement_matched: true|false, error?: string}. -3. If verified and measurement_matched: TEE identity is verified for OPAQUE Managed. - -## What "partially_verified" means - -VerificationStatus.PARTIALLY_VERIFIED is returned when: -- tee_public_key and signature are verified, but the attestation provider is not supported by this version of cmcp-verify (e.g., a new provider added after the library version) -- Some fields are verified but others are absent from the claim (e.g., tool_catalog.hash is missing -- older TRACE Claim format) -- verified_fields lists what passed; unverified_fields lists what was skipped with reason - -## Error codes - -VerificationError enum: -- UNSUPPORTED_PROVIDER: attestation_report.provider is not in the supported list for this library version -- SIGNATURE_INVALID: signature does not verify against tee_public_key -- PUBLIC_KEY_NOT_BOUND: tee_public_key is not bound to the attestation_report (measurement mismatch or quote verification failed) -- POLICY_HASH_MISMATCH: policy_bundle.hash != approved.policy_bundle_hash -- CATALOG_HASH_MISMATCH: tool_catalog.hash != approved.tool_catalog_hash -- AGENT_MANIFEST_MISMATCH: gateway.agent_identity does not match the signed Agent Manifest, the manifest signature is invalid, or trusted issuer keys were not supplied for a requested manifest check -- ATTESTATION_STALE: attestation_generated_at is older than max_attestation_age_seconds -- CHAIN_BROKEN: audit_chain_root -> audit_chain_tip traversal fails (missing entries or hash mismatch) -- CLAIM_MALFORMED: claim_json fails JSON Schema validation against the TRACE Claim schema -- EXTERNAL_EVIDENCE_VERIFICATION_FAILED: an audit bundle entry contains external_execution_evidence whose call binding, key id, evidence hash, evidence type, or issuer signature cannot be verified - -## Phase 1 support matrix - -Phase 1 must support TPM and SEV-SNP at minimum. TDX is high priority for the first release. OPAQUE is handled by the managed runtime and does not require a separate implementation path. - -`SOFTWARE_ONLY` is a valid enum value for local development and CI environments. A claim with `provider: software-only` must always return `VerificationStatus.PARTIALLY_VERIFIED` with `failure_reason` set, never `VERIFIED`. - -## Usage Example - -```python -from cmcp_verify import verify_trace_claim, ApprovedHashes -import json - -trace_claim = json.load(open("session-trace.json")) -approved = ApprovedHashes( - policy_bundle_hash="sha256:abc123...", - tool_catalog_hash="sha256:def456..." -) -result = verify_trace_claim(trace_claim, approved) -print(f"Status: {result.status.value}") -print(f"Verified fields: {result.verified_fields}") -if not result.is_attestation_fresh: - print(f"WARNING: attestation is {result.attestation_age_seconds}s old") -``` - -## Relationship to Threat Model - -As noted in [threat-model.md](threat-model.md), T.1 (server swap / tool identity) is only closed if the agent or the agent's runtime runs `verify_trace_claim` before sending traffic. Attestation without verification is post-hoc evidence, not a runtime gate. +# cmcp-verify: Verification Library Interface Spec + +!!! warning "Draft" + Status: Draft v0.1 · Stability: Unstable: expect breaking changes before v1.0 + +This document is the interface specification for the `cmcp-verify` Python library. Implementation is separate from this spec. All type stubs below define the public interface that the implementation must satisfy. + +## Type Stubs + +```python +from dataclasses import dataclass +from enum import Enum +from typing import Optional +import datetime + +class TEEProvider(Enum): + TPM = "tpm" + SEV_SNP = "sev-snp" + TDX = "tdx" + OPAQUE = "opaque" + SOFTWARE_ONLY = "software-only" + +class VerificationStatus(Enum): + VERIFIED = "verified" + UNVERIFIED = "unverified" + PARTIALLY_VERIFIED = "partially_verified" + +@dataclass +class VerificationResult: + status: VerificationStatus + verified_fields: list[str] # fields successfully verified + unverified_fields: list[str] # fields present but not verified (provider not supported, etc.) + failure_reason: Optional[str] # None if status != UNVERIFIED + attestation_age_seconds: int # how old the attestation is + is_attestation_fresh: bool # attestation_age_seconds < configured validity window + +@dataclass +class ApprovedHashes: + policy_bundle_hash: str # sha256 hex string of approved policy bundle + tool_catalog_hash: str # sha256 hex string of approved tool catalog + +def verify_trace_claim( + claim_json: dict, + approved: ApprovedHashes, + max_attestation_age_seconds: int = 86400, + *, + trusted_public_key_hex: Optional[str] = None, + agent_manifest: Optional[dict] = None, + trusted_agent_manifest_keys: Optional[dict[str, bytes]] = None, +) -> VerificationResult: + """ + Verify a TRACE Claim without trusting the operator. + + Steps: + 1. Verify tee_public_key is bound to attestation_report (provider-specific) + 2. Verify signature over canonical claim body using tee_public_key + 3. Check policy_bundle.hash against approved.policy_bundle_hash + 4. Check tool_catalog.hash against approved.tool_catalog_hash + 5. If agent_manifest and trusted_agent_manifest_keys are provided, verify + the Agent Manifest issuer signature with agent-manifest SDK + verify_manifest() and cross-check gateway.agent_identity: + manifest_id, agent_id/authenticated_subject, subject_source, policy hash, + catalog hash, and manifest expiry. + 6. Check attestation freshness (timestamp within max_attestation_age_seconds) + 7. Verify audit chain continuity (audit_chain_root, audit_chain_tip) + + Returns VerificationResult with status and details. + """ + ... +``` + +`trusted_agent_manifest_keys` keeps cMCP's runtime-facing shape as raw Ed25519 +public key bytes keyed by issuer `key_id`; the verifier base64url-encodes those +keys when calling the Agent Manifest SDK. + +### Audit Bundle Verification and External Execution Evidence + +```python +@dataclass +class AuditBundleResult: + verified: bool + entry_count: int + failures: list[str] + +def verify_audit_bundle( + bundle_json: dict, + claim_json: Optional[dict] = None, + *, + external_evidence_keys: Optional[dict[str, bytes]] = None, +) -> AuditBundleResult: + """ + Verify an exported audit bundle. When external_evidence_keys is supplied, + each key is issuer_key_id -> raw 32-byte Ed25519 public key. issuer_key_id + is lowercase hex SHA-256(public_key_bytes). + """ + ... +``` + +`external_execution_evidence.evidence_hash` is the digest of the detached evidence payload attested by the issuer, not the digest of the receipt envelope. For JSON evidence payloads, the hash pre-image is the UTF-8 bytes of the RFC 8785/JCS canonical JSON representation. For non-JSON evidence payloads, the pre-image is the exact byte string identified by the issuer's evidence format. The field value is `sha256:` or `sha384:`. + +Runtime ingestion convention: when an allowed upstream tool response is a JSON object with a top-level `external_execution_evidence` object matching the audit schema, cMCP copies that receipt into the `tool_call` audit entry. The response itself is not rewritten; `response_payload_hash` still covers the bytes returned to the caller. + +The verifier computes the receipt signing input as canonical JSON over the receipt object excluding `signature`, with sorted keys and compact separators. It then checks: + +1. `linked_call_id` equals the audit entry `call_id`. +2. `issuer_key_id` is lowercase hex SHA-256 of the trusted issuer public key. +3. `evidence_hash` has a supported hash prefix and hex digest. +4. `evidence_type` is one of the documented receipt types. +5. The Ed25519 signature verifies over the canonical receipt signing input. + +If any external evidence check fails, the audit bundle result is `verified=False` and the failure string includes `EXTERNAL_EVIDENCE_VERIFICATION_FAILED`. + +## Per-Provider Verification Steps + +### TPM Verification + +1. Obtain the TPM Endorsement Key (EK) certificate from the TPM manufacturer (e.g., fetched from the TPM itself via Esys_ReadPublic or from the manufacturer's certificate authority at ek.{manufacturer}.com). +2. Verify the EK certificate chains to a trusted manufacturer CA (TPM manufacturer CA roots are published by Microsoft, Amazon, Google for their vTPM implementations). +3. Extract the TPM2B_ATTEST structure from attestation_report.raw_evidence. +4. Verify the TPM2_Quote signature using the Attestation Key (AK) public key, which must be certified by the EK. +5. Confirm the quote's qualifying_data matches SHA-256(tee_public_key || session_id) from the TRACE Claim -- this binds the quote to the specific runtime instance. +6. Confirm the PCR values in the quote match attestation_report.measurement (compare byte-by-byte). +7. If all checks pass: TEE identity is verified for TPM. + +### SEV-SNP Verification + +1. Fetch the AMD VCEK (Versioned Chip Endorsement Key) certificate for the specific CPU. VCEK fetch URL format: https://kdsintf.amd.com/vcek/v1/{product}/{hwid}?{tcb_params}. Product is "Milan" or "Genoa". hwid and tcb_params come from the SNP attestation report. +2. Verify VCEK certificate chains to AMD Root CA (download from https://kdsintf.amd.com/vcek/v1/Milan/cert_chain). +3. Parse the SNP attestation report from attestation_report.raw_evidence (binary format per AMD SEV-SNP Firmware ABI Specification, Table 22). +4. Verify the report signature using the VCEK public key. +5. Confirm report.REPORT_DATA == SHA-256(tee_public_key || session_id) (bytes 0-31 of REPORT_DATA). +6. Confirm report.MEASUREMENT == bytes decoded from attestation_report.measurement (the 48-byte launch measurement). +7. Confirm report.POLICY fields match expected configuration (no debug mode, SMT policy as expected). +8. If all checks pass: TEE identity is verified for SEV-SNP. + +### Intel TDX Verification + +1. Fetch TDX Quote Collateral using Intel's DCAP (Data Center Attestation Primitives) API at https://api.trustedservices.intel.com/tdx/certification/v4/qe/identity. +2. Parse the TDX Quote from attestation_report.raw_evidence (follows Intel TDX Quote Generation Service format). +3. Verify the Quote using the QE (Quoting Enclave) identity and PCK (Provisioning Certification Key) certificate chain from the collateral. +4. Confirm TD_REPORT.REPORT_DATA == SHA-256(tee_public_key || session_id). +5. Confirm TD_REPORT.MRTD || RTMR0 || RTMR1 || RTMR2 || RTMR3 == attestation_report.measurement (concatenated). +6. If all checks pass: TEE identity is verified for TDX. + +### OPAQUE Managed Verification + +1. Call the OPAQUE attestation verification endpoint (provided at deployment time) with the attestation_report.raw_evidence as the request body. +2. The endpoint returns: {verified: true|false, measurement_matched: true|false, error?: string}. +3. If verified and measurement_matched: TEE identity is verified for OPAQUE Managed. + +## What "partially_verified" means + +VerificationStatus.PARTIALLY_VERIFIED is returned when: +- tee_public_key and signature are verified, but the attestation provider is not supported by this version of cmcp-verify (e.g., a new provider added after the library version) +- Some fields are verified but others are absent from the claim (e.g., tool_catalog.hash is missing -- older TRACE Claim format) +- verified_fields lists what passed; unverified_fields lists what was skipped with reason + +## Error codes + +VerificationError enum: +- UNSUPPORTED_PROVIDER: attestation_report.provider is not in the supported list for this library version +- SIGNATURE_INVALID: signature does not verify against tee_public_key +- PUBLIC_KEY_NOT_BOUND: tee_public_key is not bound to the attestation_report (measurement mismatch or quote verification failed) +- POLICY_HASH_MISMATCH: policy_bundle.hash != approved.policy_bundle_hash +- CATALOG_HASH_MISMATCH: tool_catalog.hash != approved.tool_catalog_hash +- AGENT_MANIFEST_MISMATCH: gateway.agent_identity does not match the signed Agent Manifest, the manifest signature is invalid, or trusted issuer keys were not supplied for a requested manifest check +- ATTESTATION_STALE: attestation_generated_at is older than max_attestation_age_seconds +- CHAIN_BROKEN: audit_chain_root -> audit_chain_tip traversal fails (missing entries or hash mismatch) +- CLAIM_MALFORMED: claim_json fails JSON Schema validation against the TRACE Claim schema +- EXTERNAL_EVIDENCE_VERIFICATION_FAILED: an audit bundle entry contains external_execution_evidence whose call binding, key id, evidence hash, evidence type, or issuer signature cannot be verified + +## Phase 1 support matrix + +Phase 1 must support TPM and SEV-SNP at minimum. TDX is high priority for the first release. OPAQUE is handled by the managed runtime and does not require a separate implementation path. + +`SOFTWARE_ONLY` is a valid enum value for local development and CI environments. A claim with `provider: software-only` must always return `VerificationStatus.PARTIALLY_VERIFIED` with `failure_reason` set, never `VERIFIED`. + +## Usage Example + +```python +from cmcp_verify import verify_trace_claim, ApprovedHashes +import json + +trace_claim = json.load(open("session-trace.json")) +approved = ApprovedHashes( + policy_bundle_hash="sha256:abc123...", + tool_catalog_hash="sha256:def456..." +) +result = verify_trace_claim(trace_claim, approved) +print(f"Status: {result.status.value}") +print(f"Verified fields: {result.verified_fields}") +if not result.is_attestation_fresh: + print(f"WARNING: attestation is {result.attestation_age_seconds}s old") +``` + +## Relationship to Threat Model + +As noted in [threat-model.md](threat-model.md), T.1 (server swap / tool identity) is only closed if the agent or the agent's runtime runs `verify_trace_claim` before sending traffic. Attestation without verification is post-hoc evidence, not a runtime gate. diff --git a/docs/testing/benchmarks.md b/docs/testing/benchmarks.md index aab0f73..1fcae2e 100644 --- a/docs/testing/benchmarks.md +++ b/docs/testing/benchmarks.md @@ -1,139 +1,139 @@ -# cMCP Runtime — Latency Targets and Benchmarks - -## Latest results - -Benchmark results are committed to [`benchmarks/`](https://github.com/agentrust-io/cmcp/tree/main/benchmarks) by the nightly CI workflow after each run on TEE hardware. Each result file covers one provider and reports p50/p95/p99 latency in microseconds. - -The directory is currently empty — results will appear after the first scheduled CI run on production TEE hardware. TEE hardware benchmarks are run on Azure DCasv5 (SEV-SNP) and GCP C3 Confidential VM (TDX). - ---- - -## Overview - -This document defines latency targets and the benchmark methodology for the cMCP Runtime. Targets are split by phase: - -- **Phase 1**: Runtime intercept path only (Cedar policy evaluation, audit entry creation, routing). No payload inspection. -- **Phase 2**: Full proxy path with payload inspection (pattern-based and model-based classification). - ---- - -## Phase 1 Targets - -### Attestation Handshake (one-time, at runtime startup) - -Attestation is a startup cost, not a per-call cost. It is not included in the per-call latency budget. - -| TEE Provider | Target | Notes | -|-----------------|------------|----------------------------------------------------| -| TPM | < 500ms | Hardware I/O bound; TPM attestation is slow | -| SEV-SNP | < 100ms | Azure DCasv5, AWS C6a Nitro | -| TDX | < 100ms | Azure DCedsv5, GCP C3 | -| OPAQUE Managed | < 50ms | OPAQUE Managed Runtime, highest assurance | - -### Per-Call Runtime Overhead - -Covers Cedar policy evaluation + audit entry creation + routing. Excludes upstream tool execution time. - -| Percentile | Target | -|------------|---------| -| p50 | < 1ms | -| p95 | < 3ms | -| p99 | < 5ms | - -Expected breakdown for a 10-rule policy bundle: - -| Component | Estimated cost | -|--------------------------------|--------------------| -| Cedar evaluation (10 rules) | 0.2 – 0.5ms | -| Audit entry hash computation | ~0.1ms | -| Network routing overhead | 0.5 – 2ms | - ---- - -## Phase 2 Targets - -Phase 2 adds payload inspection between runtime receive and upstream forward. - -| Path | p50 | p95 | p99 | -|------------------------------------------------|---------|---------|---------| -| Pattern-based classification (regex + schema) | < 2ms | < 8ms | < 10ms | -| Model-based classification (semantic ML) | < 30ms | < 80ms | < 100ms | -| Full proxy path (Cedar + pattern) | < 5ms | < 12ms | < 15ms | - -**Notes:** -- Pattern classification is measured against a 1KB JSON payload with 20 patterns. -- Model-based classification is Phase 2+ and not required for Phase 1. - ---- - -## Benchmark Methodology - -### Hardware - -Run one benchmark suite per TEE provider, on TEE-enabled hardware matching production targets. Do not run benchmarks on non-TEE hardware and report results as representative. - -### Representative Policy Bundle - -A 12-rule Cedar bundle: -- 10 tool allowlist rules -- 2 field-redaction rules -- 1 cross-boundary rule - -### Representative Payloads - -**Tool call (request):** -```json -{ - "tool_name": "salesforce.query", - "arguments": { - "soql": "SELECT Id, Name, Email FROM Contact WHERE AccountId = '001x000001'", - "max_records": 100 - } -} -``` -Approximately 200 bytes. - -**Tool response:** 1KB JSON with 10 fields, 2 of which are PII-tagged. - -### Warmup - -Run 1000 calls before measurement starts. This eliminates JIT compilation and cache cold-start effects from reported numbers. - -### Measurement - -- 10,000 calls per benchmark run -- Report p50, p95, p99 per run -- Run 5 times and average across runs - -### Metrics - -Collect the following per run, in microseconds unless noted: - -| Metric | Unit | Description | -|---------------------------|-------|-------------------------------------------------------------------------------| -| `cedar_eval_latency_us` | µs | Cedar policy evaluation time | -| `audit_entry_latency_us` | µs | Time to hash and append audit chain entry | -| `routing_latency_us` | µs | Time from runtime receive to first byte sent to upstream | -| `end_to_end_latency_us` | µs | Time from agent request received to response returned (excludes upstream) | -| `attestation_handshake_ms`| ms | Measured once at startup, not per-call | - ---- - -## Reporting Format - -Benchmark results are committed as JSON to the `benchmarks/` directory in CI after each run. - -```json -{ - "provider": "sev-snp", - "timestamp": "2026-06-04T00:00:00Z", - "policy_rules_count": 12, - "payload_bytes": 200, - "calls_measured": 10000, - "cedar_eval_us": {"p50": 210, "p95": 450, "p99": 890}, - "audit_entry_us": {"p50": 95, "p95": 180, "p99": 350}, - "end_to_end_us": {"p50": 850, "p95": 2100, "p99": 4200} -} -``` - -File naming: `benchmarks/-YYYY-MM-DD.json`. One file per provider per run. +# cMCP Runtime: Latency Targets and Benchmarks + +## Latest results + +Benchmark results are committed to [`benchmarks/`](https://github.com/agentrust-io/cmcp/tree/main/benchmarks) by the nightly CI workflow after each run on TEE hardware. Each result file covers one provider and reports p50/p95/p99 latency in microseconds. + +The directory is currently empty: results will appear after the first scheduled CI run on production TEE hardware. TEE hardware benchmarks are run on Azure DCasv5 (SEV-SNP) and GCP C3 Confidential VM (TDX). + +--- + +## Overview + +This document defines latency targets and the benchmark methodology for the cMCP Runtime. Targets are split by phase: + +- **Phase 1**: Runtime intercept path only (Cedar policy evaluation, audit entry creation, routing). No payload inspection. +- **Phase 2**: Full proxy path with payload inspection (pattern-based and model-based classification). + +--- + +## Phase 1 Targets + +### Attestation Handshake (one-time, at runtime startup) + +Attestation is a startup cost, not a per-call cost. It is not included in the per-call latency budget. + +| TEE Provider | Target | Notes | +|-----------------|------------|----------------------------------------------------| +| TPM | < 500ms | Hardware I/O bound; TPM attestation is slow | +| SEV-SNP | < 100ms | Azure DCasv5, AWS C6a Nitro | +| TDX | < 100ms | Azure DCedsv5, GCP C3 | +| OPAQUE Managed | < 50ms | OPAQUE Managed Runtime, highest assurance | + +### Per-Call Runtime Overhead + +Covers Cedar policy evaluation + audit entry creation + routing. Excludes upstream tool execution time. + +| Percentile | Target | +|------------|---------| +| p50 | < 1ms | +| p95 | < 3ms | +| p99 | < 5ms | + +Expected breakdown for a 10-rule policy bundle: + +| Component | Estimated cost | +|--------------------------------|--------------------| +| Cedar evaluation (10 rules) | 0.2 – 0.5ms | +| Audit entry hash computation | ~0.1ms | +| Network routing overhead | 0.5 – 2ms | + +--- + +## Phase 2 Targets + +Phase 2 adds payload inspection between runtime receive and upstream forward. + +| Path | p50 | p95 | p99 | +|------------------------------------------------|---------|---------|---------| +| Pattern-based classification (regex + schema) | < 2ms | < 8ms | < 10ms | +| Model-based classification (semantic ML) | < 30ms | < 80ms | < 100ms | +| Full proxy path (Cedar + pattern) | < 5ms | < 12ms | < 15ms | + +**Notes:** +- Pattern classification is measured against a 1KB JSON payload with 20 patterns. +- Model-based classification is Phase 2+ and not required for Phase 1. + +--- + +## Benchmark Methodology + +### Hardware + +Run one benchmark suite per TEE provider, on TEE-enabled hardware matching production targets. Do not run benchmarks on non-TEE hardware and report results as representative. + +### Representative Policy Bundle + +A 12-rule Cedar bundle: +- 10 tool allowlist rules +- 2 field-redaction rules +- 1 cross-boundary rule + +### Representative Payloads + +**Tool call (request):** +```json +{ + "tool_name": "salesforce.query", + "arguments": { + "soql": "SELECT Id, Name, Email FROM Contact WHERE AccountId = '001x000001'", + "max_records": 100 + } +} +``` +Approximately 200 bytes. + +**Tool response:** 1KB JSON with 10 fields, 2 of which are PII-tagged. + +### Warmup + +Run 1000 calls before measurement starts. This eliminates JIT compilation and cache cold-start effects from reported numbers. + +### Measurement + +- 10,000 calls per benchmark run +- Report p50, p95, p99 per run +- Run 5 times and average across runs + +### Metrics + +Collect the following per run, in microseconds unless noted: + +| Metric | Unit | Description | +|---------------------------|-------|-------------------------------------------------------------------------------| +| `cedar_eval_latency_us` | µs | Cedar policy evaluation time | +| `audit_entry_latency_us` | µs | Time to hash and append audit chain entry | +| `routing_latency_us` | µs | Time from runtime receive to first byte sent to upstream | +| `end_to_end_latency_us` | µs | Time from agent request received to response returned (excludes upstream) | +| `attestation_handshake_ms`| ms | Measured once at startup, not per-call | + +--- + +## Reporting Format + +Benchmark results are committed as JSON to the `benchmarks/` directory in CI after each run. + +```json +{ + "provider": "sev-snp", + "timestamp": "2026-06-04T00:00:00Z", + "policy_rules_count": 12, + "payload_bytes": 200, + "calls_measured": 10000, + "cedar_eval_us": {"p50": 210, "p95": 450, "p99": 890}, + "audit_entry_us": {"p50": 95, "p95": 180, "p99": 350}, + "end_to_end_us": {"p50": 850, "p95": 2100, "p99": 4200} +} +``` + +File naming: `benchmarks/-YYYY-MM-DD.json`. One file per provider per run. diff --git a/docs/tutorials/advisory-mode-debugging.md b/docs/tutorials/advisory-mode-debugging.md index a47c063..18ce5a5 100644 --- a/docs/tutorials/advisory-mode-debugging.md +++ b/docs/tutorials/advisory-mode-debugging.md @@ -1,6 +1,6 @@ # Advisory Mode Debugging -Run the gateway in `advisory` mode to understand which calls your Cedar policy would deny — without blocking any traffic. Use this to tune policy before switching to `enforcing`. +Run the gateway in `advisory` mode to understand which calls your Cedar policy would deny: without blocking any traffic. Use this to tune policy before switching to `enforcing`. ## What you'll learn @@ -27,7 +27,7 @@ The `enforcement_mode` field in `cmcp-config.yaml` has three valid values: | `advisory` | Call proceeds, `would_have_denied: true` set in `_cmcp` response | | `silent` | Policy is evaluated but nothing is logged or blocked | -Default is `enforcing`. Silent mode gives you evaluation without any output — useful for baselining before you have policies written. Advisory is the useful middle ground: real traffic continues, but denials are fully visible. +Default is `enforcing`. Silent mode gives you evaluation without any output: useful for baselining before you have policies written. Advisory is the useful middle ground: real traffic continues, but denials are fully visible. --- @@ -75,7 +75,7 @@ When a call would have been denied, the `_cmcp` block in the response carries `w } ``` -`would_have_denied: true` means the Cedar policy matched at least one `forbid` rule for this call. The `advice` field, when present, contains annotations from the matched rule — this is operator-authored content from the policy bundle, not caller input. +`would_have_denied: true` means the Cedar policy matched at least one `forbid` rule for this call. The `advice` field, when present, contains annotations from the matched rule: this is operator-authored content from the policy bundle, not caller input. When `would_have_denied: false`, the call was allowed by policy and no forbid rules matched. @@ -224,4 +224,4 @@ In silent mode the policy runs but neither logs nor blocks. Use it only to confi `would_have_denied: true` in `_cmcp` is your per-call signal. `policy_decision: "advisory_deny"` in the audit chain is the durable record. Use both together: the response signal for real-time instrumentation, the audit chain for post-run analysis. -Related tutorials: [Cedar policy walkthrough](./cedar-policy-walkthrough.md) — writing the Cedar rules that produce these denials. [Connecting agent frameworks](./connecting-agent-frameworks.md) — how to read `_cmcp` metadata from your agent code. +Related tutorials: [Cedar policy walkthrough](./cedar-policy-walkthrough.md): writing the Cedar rules that produce these denials. [Connecting agent frameworks](./connecting-agent-frameworks.md): how to read `_cmcp` metadata from your agent code. diff --git a/docs/tutorials/cedar-policy-walkthrough.md b/docs/tutorials/cedar-policy-walkthrough.md index 3b0ec62..bb4bc85 100644 --- a/docs/tutorials/cedar-policy-walkthrough.md +++ b/docs/tutorials/cedar-policy-walkthrough.md @@ -25,7 +25,7 @@ cMCP evaluates every tool call against Cedar policies using three entities: | Entity role | cMCP type | Example value | |---|---|---| | `principal` | `cMCP::Principal` | session that initiated the call | -| `action` | `cMCP::Action::"call_tool"` | fixed — all tool calls use this action | +| `action` | `cMCP::Action::"call_tool"` | fixed: all tool calls use this action | | `resource` | `cMCP::Resource` | the tool being called | The principal carries `session_id` and `workflow_id`. The resource carries `tool_name` and `server_domain`. Cedar also receives a `context` record with `session_max_sensitivity` and `workflow_id`. @@ -103,7 +103,7 @@ when { }; // Explicit default-deny: Cedar is default-deny already, but this makes it -// auditable — the bundle hash changes if this rule is removed +// auditable: the bundle hash changes if this rule is removed forbid ( principal, action == cMCP::Action::"call_tool", @@ -175,4 +175,4 @@ when { You wrote a minimal dev policy and a production policy with workflow scoping, a PII-triggered forbid, and an explicit default-deny. You tested both with the `cedar` CLI before loading them into the runtime. Any change to the policy bundle changes the `policy_bundle.hash` field in TRACE Claims, making the active policy tamper-evident. -Related tutorials: [Verify a TRACE claim](./verifying-a-trace-claim.md) — confirm the policy hash in a produced claim matches what you deployed. [Multi-tenant deployment](./multi-tenant-config.md) — run per-tenant policy bundles with separate hashes. +Related tutorials: [Verify a TRACE claim](./verifying-a-trace-claim.md): confirm the policy hash in a produced claim matches what you deployed. [Multi-tenant deployment](./multi-tenant-config.md): run per-tenant policy bundles with separate hashes. diff --git a/docs/tutorials/connecting-agent-frameworks.md b/docs/tutorials/connecting-agent-frameworks.md index f2aea35..a11e57b 100644 --- a/docs/tutorials/connecting-agent-frameworks.md +++ b/docs/tutorials/connecting-agent-frameworks.md @@ -1,6 +1,6 @@ # Connecting Agent Frameworks -Wire a real agent — LangChain, LlamaIndex, or a plain HTTP client — to the cMCP gateway so every tool call passes through policy enforcement. +Wire a real agent: LangChain, LlamaIndex, or a plain HTTP client: to the cMCP gateway so every tool call passes through policy enforcement. ## What you'll learn @@ -62,7 +62,7 @@ curl -s -X POST http://localhost:8443/mcp \ -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' ``` -The response lists only tools in the attested catalog — any tool not in `catalog.json` cannot be called regardless of what the agent requests. +The response lists only tools in the attested catalog: any tool not in `catalog.json` cannot be called regardless of what the agent requests. --- @@ -296,7 +296,7 @@ When a call is denied by policy, the gateway returns HTTP 403: } ``` -`error_code` is either `POLICY_DENY` (a Cedar forbid rule matched) or `TOOL_NOT_IN_CATALOG` (the tool name is not in the approved catalog). The `advice` field, when present, carries annotations from the policy rule — these come from the hash-pinned policy bundle, not from caller input, so they are safe to log and act on. +`error_code` is either `POLICY_DENY` (a Cedar forbid rule matched) or `TOOL_NOT_IN_CATALOG` (the tool name is not in the approved catalog). The `advice` field, when present, carries annotations from the policy rule: these come from the hash-pinned policy bundle, not from caller input, so they are safe to log and act on. --- @@ -310,4 +310,4 @@ When a call is denied by policy, the gateway returns HTTP 403: Every tool call that passes through the gateway produces an `audit_entry_hash`. After the session ends, retrieve the full audit bundle at `GET /audit/export?session_id=` and verify it with `GET /sessions//trace-claim`. -Related tutorials: [Cedar policy walkthrough](./cedar-policy-walkthrough.md) — writing the policies that govern these calls. [Tool catalog authoring](./tool-catalog-authoring.md) — what goes in `catalog.json` and how definition hashes are computed. +Related tutorials: [Cedar policy walkthrough](./cedar-policy-walkthrough.md): writing the policies that govern these calls. [Tool catalog authoring](./tool-catalog-authoring.md): what goes in `catalog.json` and how definition hashes are computed. diff --git a/docs/tutorials/deploy-azure.md b/docs/tutorials/deploy-azure.md index 5aaa664..021c87e 100644 --- a/docs/tutorials/deploy-azure.md +++ b/docs/tutorials/deploy-azure.md @@ -40,7 +40,7 @@ az vm list-skus --location eastus --size dc --output table ### SEV-SNP (DCasv5) ```bash -# Set your target region — verify DCasv5 is available first: +# Set your target region: verify DCasv5 is available first: # az vm list-skus --location --size Standard_DC2as_v5 --output table LOCATION=eastus @@ -206,7 +206,7 @@ Expected startup log on a real SEV-SNP VM: cMCP Runtime starting: TEE: sev-snp, listen: 0.0.0.0:8443 ``` -The TEE field reads `sev-snp` (not `software-only`). If it reads `software-only`, the VM does not have an accessible SEV-SNP device — confirm the VM SKU and that `/dev/sev-guest` exists. +The TEE field reads `sev-snp` (not `software-only`). If it reads `software-only`, the VM does not have an accessible SEV-SNP device: confirm the VM SKU and that `/dev/sev-guest` exists. --- @@ -280,7 +280,7 @@ az group delete --name cmcp-rg --yes --no-wait ## Next steps -- [GCP deployment](./deploy-gcp.md) — Intel TDX on GCP C3 Confidential VMs -- [TEE attestation](./tee-attestation.md) — detailed explanation of what each provider proves -- [Verify a TRACE claim](./verifying-a-trace-claim.md) — full verification protocol -- [Multi-tenant deployment](./multi-tenant-config.md) — one gateway instance per tenant +- [GCP deployment](./deploy-gcp.md): Intel TDX on GCP C3 Confidential VMs +- [TEE attestation](./tee-attestation.md): detailed explanation of what each provider proves +- [Verify a TRACE claim](./verifying-a-trace-claim.md): full verification protocol +- [Multi-tenant deployment](./multi-tenant-config.md): one gateway instance per tenant diff --git a/docs/tutorials/deploy-gcp.md b/docs/tutorials/deploy-gcp.md index 6081f2a..12add4e 100644 --- a/docs/tutorials/deploy-gcp.md +++ b/docs/tutorials/deploy-gcp.md @@ -197,7 +197,7 @@ Expected startup log on a real TDX VM: cMCP Runtime starting: TEE: tdx, listen: 0.0.0.0:8443 ``` -The TEE field reads `tdx`. If it reads `software-only`, the TDX device was not found — confirm the instance type and that `/dev/tdx_guest` exists. +The TEE field reads `tdx`. If it reads `software-only`, the TDX device was not found: confirm the instance type and that `/dev/tdx_guest` exists. --- @@ -273,7 +273,7 @@ gcloud compute firewall-rules delete allow-cmcp --quiet ## Next steps -- [Azure deployment](./deploy-azure.md) — AMD SEV-SNP and TDX on Azure DCasv5 / DCedsv5 -- [TEE attestation](./tee-attestation.md) — detailed explanation of what each provider proves -- [Verify a TRACE claim](./verifying-a-trace-claim.md) — full verification protocol -- [Multi-tenant deployment](./multi-tenant-config.md) — one gateway instance per tenant +- [Azure deployment](./deploy-azure.md): AMD SEV-SNP and TDX on Azure DCasv5 / DCedsv5 +- [TEE attestation](./tee-attestation.md): detailed explanation of what each provider proves +- [Verify a TRACE claim](./verifying-a-trace-claim.md): full verification protocol +- [Multi-tenant deployment](./multi-tenant-config.md): one gateway instance per tenant diff --git a/docs/tutorials/kill-switch.md b/docs/tutorials/kill-switch.md index 0979bf0..15ddc1d 100644 --- a/docs/tutorials/kill-switch.md +++ b/docs/tutorials/kill-switch.md @@ -21,11 +21,11 @@ An [Agent Manifest](../../docs/spec/component-model.md) must be bound to the gat ## Background -In a production deployment an agent can go rogue: a bug, a prompt injection, or a misconfiguration causes it to request tool calls that policy forbids. Without automated remediation, the agent keeps running — accumulating denies in the audit chain but never stopping. +In a production deployment an agent can go rogue: a bug, a prompt injection, or a misconfiguration causes it to request tool calls that policy forbids. Without automated remediation, the agent keeps running: accumulating denies in the audit chain but never stopping. The kill switch closes this gap. cMCP tracks policy decisions per agent identity in a rolling time window. When the deny rate crosses a configurable threshold with enough samples, the runtime: -1. Marks the closing TRACE claim with `gateway.kill_switch_triggered: true` — hardware-attested evidence of automated enforcement, verifiable offline by any regulator +1. Marks the closing TRACE claim with `gateway.kill_switch_triggered: true`: hardware-attested evidence of automated enforcement, verifiable offline by any regulator 2. Blocks all subsequent `create_session()` calls from that agent identity with a `KILL_SWITCH_TRIPPED (403)` response 3. Appends a `break_glass_used` audit entry to the chain recording the trigger event @@ -40,12 +40,12 @@ Add a `kill_switch` block to `cmcp-config.yaml`: ```yaml kill_switch: enabled: true - window_seconds: 300 # rolling window — 5 minutes + window_seconds: 300 # rolling window: 5 minutes deny_rate_threshold: 0.9 # trip at 90% deny rate min_calls: 10 # require at least 10 calls before evaluating ``` -All fields have defaults — setting `enabled: false` (the default) disables evaluation without removing the block. +All fields have defaults: setting `enabled: false` (the default) disables evaluation without removing the block. | Field | Default | Description | |---|---|---| @@ -82,7 +82,7 @@ export CMCP_BEARER_TOKEN="$(openssl rand -hex 32)" cmcp start --config cmcp-config.yaml ``` -Run a session where the agent makes mostly denied calls. When the session closes, cMCP evaluates the rolling window and — if the threshold is exceeded — marks the claim: +Run a session where the agent makes mostly denied calls. When the session closes, cMCP evaluates the rolling window and: if the threshold is exceeded: marks the claim: ```json { @@ -123,10 +123,10 @@ result = verify_trace_claim(claim, approved) if result.status == "verified": if claim["gateway"]["kill_switch_triggered"]: - print("Agent was automatically blocked — hardware-attested enforcement confirmed.") + print("Agent was automatically blocked: hardware-attested enforcement confirmed.") ``` -A verifier running offline — with no connection to the cMCP gateway or to OPAQUE — can confirm that: +A verifier running offline: with no connection to the cMCP gateway or to OPAQUE: can confirm that: - The kill switch fired in this session (`kill_switch_triggered: true`) - The policy that caused the denies is recorded by hash in `trace.policy.bundle_hash` @@ -137,7 +137,7 @@ A verifier running offline — with no connection to the cMCP gateway or to OPAQ ## Unblock an agent identity -The kill switch is a process-lifetime block — it persists as long as the gateway process is running. To unblock, restart the gateway. This clears all in-memory state including the blocked identity set and the rolling window. +The kill switch is a process-lifetime block: it persists as long as the gateway process is running. To unblock, restart the gateway. This clears all in-memory state including the blocked identity set and the rolling window. For a manual operator override without restart, cMCP exposes an operator endpoint (requires `CMCP_BEARER_TOKEN`): @@ -146,13 +146,13 @@ curl -X DELETE https://localhost:8443/admin/kill-switch/spiffe%3A%2F%2Fexample.c -H "Authorization: Bearer $CMCP_BEARER_TOKEN" ``` -This calls `KillSwitchEvaluator.unblock()` — clearing the block flag and all rolling window events for that identity. The action is logged to the audit chain. +This calls `KillSwitchEvaluator.unblock()`: clearing the block flag and all rolling window events for that identity. The action is logged to the audit chain. --- ## What counts as a deny -Both `deny` and `advisory_deny` policy decisions count toward the deny rate. A `fault` (tool error) does not count — it indicates a tool-side failure, not a policy enforcement event. +Both `deny` and `advisory_deny` policy decisions count toward the deny rate. A `fault` (tool error) does not count: it indicates a tool-side failure, not a policy enforcement event. | Decision | Counted as deny? | |---|---| @@ -168,7 +168,7 @@ Both `deny` and `advisory_deny` policy decisions count toward the deny rate. A ` For UAE federal ministries and other sovereign deployments, `kill_switch_triggered: true` in a TRACE claim is the answer to "what happens when an agent goes rogue." The proof is hardware-rooted: -- The TEE signs the claim — the cloud operator and the ministry IT team cannot produce this artifact for a different outcome +- The TEE signs the claim: the cloud operator and the ministry IT team cannot produce this artifact for a different outcome - The audit chain entry records the agent identity, the deny rate window, and the trigger timestamp - The claim is verifiable offline by the federal oversight body without calling back to any OPAQUE service @@ -180,4 +180,4 @@ This closes the regulatory gap that a log file cannot close: a log entry is some You configured the rolling-window kill switch, ran a session that tripped the threshold, and verified that the closing TRACE claim carries `gateway.kill_switch_triggered: true`. Subsequent sessions from the flagged agent identity are rejected with `KILL_SWITCH_TRIPPED (403)`. The hardware-signed artifact is verifiable by any regulator offline. -Related tutorials: [TEE attestation](./tee-attestation.md) — hardware-backing the TRACE claim that carries `kill_switch_triggered`. [Verify a TRACE claim](./verifying-a-trace-claim.md) — checking `kill_switch_triggered` as part of offline verification. +Related tutorials: [TEE attestation](./tee-attestation.md): hardware-backing the TRACE claim that carries `kill_switch_triggered`. [Verify a TRACE claim](./verifying-a-trace-claim.md): checking `kill_switch_triggered` as part of offline verification. diff --git a/docs/tutorials/multi-tenant-config.md b/docs/tutorials/multi-tenant-config.md index c1f1e69..e6a3b97 100644 --- a/docs/tutorials/multi-tenant-config.md +++ b/docs/tutorials/multi-tenant-config.md @@ -88,7 +88,7 @@ forbid ( ); ``` -`tenant-a/catalog.json` — includes only the two tools tenant A is allowed to call: +`tenant-a/catalog.json`: includes only the two tools tenant A is allowed to call: ```json [ @@ -150,7 +150,7 @@ forbid ( ); ``` -`tenant-b/catalog.json` — includes only the analytics tools. +`tenant-b/catalog.json`: includes only the analytics tools. --- @@ -268,4 +268,4 @@ When you export the audit bundle for a session (`GET /audit/export`), the bundle Per-tenant isolation in cMCP is one gateway instance per tenant, each with its own config, Cedar bundle, catalog, and listener port. The policy bundle hash and catalog hash differ per tenant and are recorded in TRACE claims, making tenant identity tamper-evident to verifiers. Audit chains are session-scoped and process-isolated. -Related tutorials: [Cedar policy walkthrough](./cedar-policy-walkthrough.md) — writing the per-tenant Cedar policies. [Verify a TRACE claim](./verifying-a-trace-claim.md) — verifying tenant-specific claims with the correct approved hashes. +Related tutorials: [Cedar policy walkthrough](./cedar-policy-walkthrough.md): writing the per-tenant Cedar policies. [Verify a TRACE claim](./verifying-a-trace-claim.md): verifying tenant-specific claims with the correct approved hashes. diff --git a/docs/tutorials/response-inspection.md b/docs/tutorials/response-inspection.md index 82af52e..45b9caa 100644 --- a/docs/tutorials/response-inspection.md +++ b/docs/tutorials/response-inspection.md @@ -21,12 +21,12 @@ pip install cmcp-runtime Response inspection runs after the upstream tool server returns a response, before the response is delivered to the agent. Cedar policy evaluation runs before the call; inspection runs after. The two stages are complementary: Cedar controls what calls are allowed, inspection controls what responses reach the agent. -The pipeline has four stages that run in sequence. All stages run even if an earlier stage would deny — this produces a complete audit record rather than stopping at first failure: +The pipeline has four stages that run in sequence. All stages run even if an earlier stage would deny: this produces a complete audit record rather than stopping at first failure: -1. **Size check** — reject responses over `max_response_size_bytes` (default 2 MB) -2. **Schema validation** — check response against the tool's `output_schema` in the catalog; strip or reject surplus fields -3. **Sensitivity classification** — tag the response with sensitivity labels (`pii`, `hipaa_phi`, etc.) from catalog annotations and field-level schema tags -4. **Injection detection** — scan response content for patterns that resemble injected instructions +1. **Size check**: reject responses over `max_response_size_bytes` (default 2 MB) +2. **Schema validation**: check response against the tool's `output_schema` in the catalog; strip or reject surplus fields +3. **Sensitivity classification**: tag the response with sensitivity labels (`pii`, `hipaa_phi`, etc.) from catalog annotations and field-level schema tags +4. **Injection detection**: scan response content for patterns that resemble injected instructions A response denied at any stage is not delivered to the agent. Session sensitivity state is updated regardless of whether the response was denied. @@ -53,7 +53,7 @@ These patterns are matched against the full response body as a UTF-8 string. The patterns `xml-context-tag` and `persona-hijack` carry documented false-positive risk. A CRM tool that returns contact roles ("Account Executive") can match `persona-hijack`. A data API that returns XML with `` elements will match `xml-context-tag`. -The pattern list is compiled into the binary from `patterns_v1.json`. There is no runtime config key to disable individual patterns or add custom patterns in the current release — customization requires rebuilding with a modified patterns file. +The pattern list is compiled into the binary from `patterns_v1.json`. There is no runtime config key to disable individual patterns or add custom patterns in the current release: customization requires rebuilding with a modified patterns file. --- @@ -63,7 +63,7 @@ When an injection pattern matches a response: 1. The response is denied. It is not delivered to the agent. 2. The audit entry records `response_inspection_result: "injection_detected"` and `injection_pattern_matched: ""`. -3. A 50-character window centered on the match location is logged for investigation. The full response payload is not logged — it may contain sensitive data. The full response hash is available as `response_payload_hash` in the audit entry. +3. A 50-character window centered on the match location is logged for investigation. The full response payload is not logged: it may contain sensitive data. The full response hash is available as `response_payload_hash` in the audit entry. 4. Session sensitivity state is updated even for the denied response. 5. The gateway returns a structured error to the agent. @@ -135,4 +135,4 @@ if result.failures: The response inspection pipeline runs four stages after every tool call returns. Injection detection in Stage 4 matches the full response body against the patterns in `patterns_v1.json`. When a pattern fires, the response is blocked and the audit chain records the pattern name and a location window. Monitor for injection events by filtering exported audit bundles on `response_inspection_result: "deny"`. -Related tutorials: [Cedar policy walkthrough](./cedar-policy-walkthrough.md) — Cedar `advice` blocks in policy rules instruct the inspection pipeline to redact named fields from the response. [Verify a TRACE claim](./verifying-a-trace-claim.md) — the audit chain that inspection writes to is verified as part of TRACE claim verification. +Related tutorials: [Cedar policy walkthrough](./cedar-policy-walkthrough.md): Cedar `advice` blocks in policy rules instruct the inspection pipeline to redact named fields from the response. [Verify a TRACE claim](./verifying-a-trace-claim.md): the audit chain that inspection writes to is verified as part of TRACE claim verification. diff --git a/docs/tutorials/tee-attestation.md b/docs/tutorials/tee-attestation.md index fcfa100..823a8dd 100644 --- a/docs/tutorials/tee-attestation.md +++ b/docs/tutorials/tee-attestation.md @@ -63,7 +63,7 @@ In `software-only` mode: On a real TEE host: - `trace.runtime.platform` reflects the hardware: `"amd-sev-snp"`, `"tpm2"`, `"intel-tdx"`, etc. -- `trace.runtime.measurement` is the real hardware measurement — a non-zero hash specific to the loaded workload +- `trace.runtime.measurement` is the real hardware measurement: a non-zero hash specific to the loaded workload - `verify_trace_claim` returns `status: "verified"` with `hardware_attestation` in `verified_fields` The measurement value is deterministic for a given workload binary and startup config. If the workload binary changes (e.g., an update to `cmcp-runtime`) the measurement changes, and verifiers who pinned the previous measurement will see a mismatch. @@ -153,4 +153,4 @@ Software-only mode leaves all four threat classes open. The audit chain and poli You configured cMCP for AMD SEV-SNP, confirmed the `trace.runtime.platform` and `trace.runtime.measurement` fields reflect real hardware values, and pinned the expected measurement in config. On a real TEE host, `verify_trace_claim` returns `status: "verified"` with `hardware_attestation` in `verified_fields`, providing hardware-backed assurance that the workload was not tampered with. -Related tutorials: [Verify a TRACE claim](./verifying-a-trace-claim.md) — hardware attestation is one of the verification steps that determines overall status. [Multi-tenant deployment](./multi-tenant-config.md) — each tenant's policy bundle hash is separate; the hardware measurement is shared across tenants on the same host. +Related tutorials: [Verify a TRACE claim](./verifying-a-trace-claim.md): hardware attestation is one of the verification steps that determines overall status. [Multi-tenant deployment](./multi-tenant-config.md): each tenant's policy bundle hash is separate; the hardware measurement is shared across tenants on the same host. diff --git a/docs/tutorials/tls-pinning.md b/docs/tutorials/tls-pinning.md index 7068534..8907a93 100644 --- a/docs/tutorials/tls-pinning.md +++ b/docs/tutorials/tls-pinning.md @@ -27,7 +27,7 @@ The quickstart `catalog.json` uses this fingerprint value: "tls_fingerprint": "SHA256:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=" ``` -This is a sentinel, not a real certificate pin. When the runtime encounters this placeholder it logs a one-time warning and falls back to standard CA verification — the connection proceeds but peer identity is verified by CA trust only, not pinned to the catalog. The audit chain records `evidence_class: "hash-only"` for every call to that server. +This is a sentinel, not a real certificate pin. When the runtime encounters this placeholder it logs a one-time warning and falls back to standard CA verification: the connection proceeds but peer identity is verified by CA trust only, not pinned to the catalog. The audit chain records `evidence_class: "hash-only"` for every call to that server. Replace it with the real SHA-256 fingerprint of your upstream server's certificate before setting `enforcement_mode: enforcing`. @@ -89,7 +89,7 @@ The runtime verifies this fingerprint on every outbound connection to the tool s After updating `catalog.json`, recompute the catalog hash and set `CMCP_CATALOG_HASH`. -The runtime computes the hash over the canonical JSON of the catalog entries sorted by `tool_name` — not over the raw file bytes. The snippet below replicates that computation exactly: +The runtime computes the hash over the canonical JSON of the catalog entries sorted by `tool_name`: not over the raw file bytes. The snippet below replicates that computation exactly: ```bash python3 -c " @@ -150,4 +150,4 @@ If the upstream server rotates its certificate, you must update `catalog.json`, You replaced the development sentinel fingerprint with a real SHA-256 certificate pin, updated the catalog hash, and confirmed that audit entries record `evidence_class: "tls-pinned"` for verified connections. TLS pinning prevents server substitution attacks but does not extend to individual response non-repudiation. -Related tutorials: [Cedar policy walkthrough](./cedar-policy-walkthrough.md) — the catalog is also covered by the attestation measurement. [Verify a TRACE claim](./verifying-a-trace-claim.md) — the catalog hash in the TRACE claim must match the catalog you pinned. +Related tutorials: [Cedar policy walkthrough](./cedar-policy-walkthrough.md): the catalog is also covered by the attestation measurement. [Verify a TRACE claim](./verifying-a-trace-claim.md): the catalog hash in the TRACE claim must match the catalog you pinned. diff --git a/docs/tutorials/tool-catalog-authoring.md b/docs/tutorials/tool-catalog-authoring.md index 17ff67f..f6e894b 100644 --- a/docs/tutorials/tool-catalog-authoring.md +++ b/docs/tutorials/tool-catalog-authoring.md @@ -111,7 +111,7 @@ The gateway recomputes this at load time and rejects the catalog if any entry's ### `compliance_domain` -A string label grouping tools by their compliance context. Used by Cedar policies to write rules like "tools in domain `pii` require a PII handler principal attribute." Common values: `"pii"`, `"financial"`, `"phi"`, `"internal"`, `"external"`. The runtime does not validate the value — it is a policy input. +A string label grouping tools by their compliance context. Used by Cedar policies to write rules like "tools in domain `pii` require a PII handler principal attribute." Common values: `"pii"`, `"financial"`, `"phi"`, `"internal"`, `"external"`. The runtime does not validate the value: it is a policy input. ### `requires_baa` @@ -143,7 +143,7 @@ Controls what the gateway does when a tool call argument fails schema validation | `"strict"` | Reject the call with HTTP 422 if any argument fails validation | | `"log"` | Log the violation but pass through unchanged | -Use `"strict"` for tools that handle sensitive data where unexpected fields could indicate an injection attempt. Use `"redact"` (the default) when agents may send extra fields the tool ignores. Use `"log"` only for baselining — it provides no enforcement. +Use `"strict"` for tools that handle sensitive data where unexpected fields could indicate an injection attempt. Use `"redact"` (the default) when agents may send extra fields the tool ignores. Use `"log"` only for baselining: it provides no enforcement. --- @@ -162,7 +162,7 @@ def compute_catalog_hash(entries: list[dict]) -> str: ``` Steps: -1. Sort entries by `tool_name` (ascending, case-sensitive — but tool names are always lowercase) +1. Sort entries by `tool_name` (ascending, case-sensitive: but tool names are always lowercase) 2. Canonical JSON: `sort_keys=True`, no spaces (`separators=(",", ":")`) 3. SHA-256 of the UTF-8 bytes @@ -262,7 +262,7 @@ Note that `kyc.verify` uses `"strict"` because unexpected fields in a KYC call c Compute and pin the catalog hash before starting the gateway: ```bash -# Pin the catalog hash (required in production — see CMCP_CATALOG_HASH) +# Pin the catalog hash (required in production: see CMCP_CATALOG_HASH) export CMCP_CATALOG_HASH="$(python -c " import hashlib, json entries = json.load(open('catalog.json')) @@ -290,4 +290,4 @@ Both commands exit non-zero on any validation error without starting the gateway 4. Use `"strict"` schema validation for high-sensitivity tools; `"redact"` is the safe default for others 5. `sensitivity_level` feeds session tracking; `compliance_domain` feeds Cedar policy context -Related tutorials: [Cedar policy walkthrough](./cedar-policy-walkthrough.md) — using `compliance_domain` and `sensitivity_level` in Cedar rules. [TLS pinning](./tls-pinning.md) — computing `tls_fingerprint` values. +Related tutorials: [Cedar policy walkthrough](./cedar-policy-walkthrough.md): using `compliance_domain` and `sensitivity_level` in Cedar rules. [TLS pinning](./tls-pinning.md): computing `tls_fingerprint` values. diff --git a/docs/tutorials/verifying-a-trace-claim.md b/docs/tutorials/verifying-a-trace-claim.md index c442cec..daa33b4 100644 --- a/docs/tutorials/verifying-a-trace-claim.md +++ b/docs/tutorials/verifying-a-trace-claim.md @@ -37,7 +37,7 @@ The approved hashes are the SHA-256 values printed by the gateway at startup: [cmcp] catalog loaded: 3 tools, sha256:def456... ``` -In production, these values come from your deployment pipeline — not from the operator. The point of verification is to confirm the runtime loaded what your organization approved, without trusting the operator's assertion. Store the hashes in your CI artifact registry or secrets manager at bundle-build time and retrieve them at verification time. +In production, these values come from your deployment pipeline: not from the operator. The point of verification is to confirm the runtime loaded what your organization approved, without trusting the operator's assertion. Store the hashes in your CI artifact registry or secrets manager at bundle-build time and retrieve them at verification time. --- @@ -114,7 +114,7 @@ Attestation fresh:True Details: {'hardware_attestation': 'software-only mode - not hardware-backed'} ``` -`hardware_attestation` is in `unverified_fields` but no `failure_reason` is set for it in isolation — the status rolls up to `partially_verified` because other fields were verified. On a real TEE host, `hardware_attestation` moves to `verified_fields` and status becomes `verified`. +`hardware_attestation` is in `unverified_fields` but no `failure_reason` is set for it in isolation: the status rolls up to `partially_verified` because other fields were verified. On a real TEE host, `hardware_attestation` moves to `verified_fields` and status becomes `verified`. `unverified` (with no verified fields at all) means the claim is either malformed, signature-invalid, or the hashes do not match. Treat this as a hard rejection. @@ -151,7 +151,7 @@ def verify_session_claim(claim_path: str) -> None: if not result.is_attestation_fresh: print(f"CLAIM STALE: age={result.attestation_age_seconds}s", file=sys.stderr) sys.exit(1) - print(f"WARNING: partially verified — {result.unverified_fields}") + print(f"WARNING: partially verified: {result.unverified_fields}") print(f"Claim verified. Tools called: {claim['gateway']['call_summary']['tools_invoked']}") ``` @@ -162,4 +162,4 @@ def verify_session_claim(claim_path: str) -> None: You called `verify_trace_claim` with `ApprovedHashes` sourced from your deployment pipeline (not from the operator), read `VerificationResult` fields to distinguish full verification from partial (dev-mode) verification, and integrated the check at pipeline entry. A claim that returns `unverified` must be rejected before any downstream processing uses the session output. -Related tutorials: [Cedar policy walkthrough](./cedar-policy-walkthrough.md) — the policy bundle hash you verify here is the hash of the Cedar bundle loaded at runtime. [TEE attestation](./tee-attestation.md) — switching from software-only to a real TEE makes `hardware_attestation` move from `unverified_fields` to `verified_fields`. +Related tutorials: [Cedar policy walkthrough](./cedar-policy-walkthrough.md): the policy bundle hash you verify here is the hash of the Cedar bundle loaded at runtime. [TEE attestation](./tee-attestation.md): switching from software-only to a real TEE makes `hardware_attestation` move from `unverified_fields` to `verified_fields`. diff --git a/examples/bfsi-demo/policies/block-phi-to-uncovered.cedar b/examples/bfsi-demo/policies/block-phi-to-uncovered.cedar index ed321dd..793137c 100644 --- a/examples/bfsi-demo/policies/block-phi-to-uncovered.cedar +++ b/examples/bfsi-demo/policies/block-phi-to-uncovered.cedar @@ -1,5 +1,5 @@ // Rule 3: block calls to uncovered (non-BAA) external services when -// session has touched PHI data — prevents HIPAA cross-boundary violations. +// session has touched PHI data, prevents HIPAA cross-boundary violations. forbid ( principal, action == cMCP::Action::"call_tool", diff --git a/experiments/README.md b/experiments/README.md index d11b2da..a8f8430 100644 --- a/experiments/README.md +++ b/experiments/README.md @@ -8,13 +8,13 @@ Each experiment imports directly from `cmcp_runtime`. Run from the repo root aft | Dir | Claim | Key result | |-----|-------|-----------| -| [claim1-policy-hash-binding](claim1-policy-hash-binding/) | Claim 1 — TEE-measured policy enforcement | Deterministic hash, 51% avalanche on 1-char change, PolicyHashMismatch, TRACE sig invalidated | -| [claim2-session-vs-call-policy](claim2-session-vs-call-policy/) | Claim 2 — Session sensitivity state | Session policy catches 2/2 PHI cross-boundary violations; per-call catches 0/2 | -| [claim2-false-positive-rate](claim2-false-positive-rate/) | Claim 2 — Session sensitivity state (cost) | Overall FPR 69%; Billing/Batch 100%; Clinical Decision Support 0% | -| [claim3-rug-pull-detection](claim3-rug-pull-detection/) | Claim 3 — Tool catalog drift detection | 48% bit change on one-sentence description tamper; CatalogHashMismatch fail-closed | -| [claim4-trace-claim-nonce](claim4-trace-claim-nonce/) | Claim 4 — TRACE Claim nonce binding | 6 properties: nonce determinism, session/instance binding, replay prevention, sig tamper, selective disclosure | -| [claim5-temporal-adjacency](claim5-temporal-adjacency/) | Claim 5 — Temporal adjacency provenance | Zero false negatives by construction; provenance disclaimer in every summary; denied calls in graph | -| [claim6-cross-org-attestation](claim6-cross-org-attestation/) | Claim 6 — Cross-org attestation chains | Dual-TEE protocol: independent keys, session linkage, independent verify, binary swap detection | +| [claim1-policy-hash-binding](claim1-policy-hash-binding/) | Claim 1: TEE-measured policy enforcement | Deterministic hash, 51% avalanche on 1-char change, PolicyHashMismatch, TRACE sig invalidated | +| [claim2-session-vs-call-policy](claim2-session-vs-call-policy/) | Claim 2: Session sensitivity state | Session policy catches 2/2 PHI cross-boundary violations; per-call catches 0/2 | +| [claim2-false-positive-rate](claim2-false-positive-rate/) | Claim 2: Session sensitivity state (cost) | Overall FPR 69%; Billing/Batch 100%; Clinical Decision Support 0% | +| [claim3-rug-pull-detection](claim3-rug-pull-detection/) | Claim 3: Tool catalog drift detection | 48% bit change on one-sentence description tamper; CatalogHashMismatch fail-closed | +| [claim4-trace-claim-nonce](claim4-trace-claim-nonce/) | Claim 4: TRACE Claim nonce binding | 6 properties: nonce determinism, session/instance binding, replay prevention, sig tamper, selective disclosure | +| [claim5-temporal-adjacency](claim5-temporal-adjacency/) | Claim 5: Temporal adjacency provenance | Zero false negatives by construction; provenance disclaimer in every summary; denied calls in graph | +| [claim6-cross-org-attestation](claim6-cross-org-attestation/) | Claim 6: Cross-org attestation chains | Dual-TEE protocol: independent keys, session linkage, independent verify, binary swap detection | | [claim-hw-attestation](claim-hw-attestation/) | Hardware attestation (real TEE) | Requires a confidential VM; SKIPs without one. Real report + nonce binding + end-to-end claim verification | ## Running diff --git a/experiments/claim1-policy-hash-binding/README.md b/experiments/claim1-policy-hash-binding/README.md index 41e2f66..a8bac68 100644 --- a/experiments/claim1-policy-hash-binding/README.md +++ b/experiments/claim1-policy-hash-binding/README.md @@ -4,19 +4,19 @@ **What this experiment proves:** -1. The policy bundle hash is fully determined by the bundle content — same content, same hash, every time. +1. The policy bundle hash is fully determined by the bundle content: same content, same hash, every time. 2. Any change to any byte in any policy file produces a completely different hash (avalanche property). 3. `load_policy_bundle` raises `PolicyHashMismatch` when the hash of the bundle on disk does not match the expected hash, preventing a substituted bundle from being used. 4. The bundle hash appears in the TRACE Claim's `trace.policy.bundle_hash` field and is covered by the claim's Ed25519 signature. Tampering with the hash breaks signature verification. **What this means for governance:** -A rogue administrator who modifies the Cedar policy bundle after it was approved cannot silently substitute the new bundle — the computed hash will not match the approved hash, and the gateway will refuse to start. The approved hash recorded in the TRACE Claim can be compared against the policy bundle in version control by any verifier at any time, without trusting the operator. +A rogue administrator who modifies the Cedar policy bundle after it was approved cannot silently substitute the new bundle: the computed hash will not match the approved hash, and the gateway will refuse to start. The approved hash recorded in the TRACE Claim can be compared against the policy bundle in version control by any verifier at any time, without trusting the operator. **Fixtures:** -- `fixtures/bundle-v1/` — original approved policy (permits `ehr.get_patient`) -- `fixtures/bundle-v2/` — identical except one character changed in a comment (`A` → `a` in `allow_ehr_tools.cedar` line 1) +- `fixtures/bundle-v1/`: original approved policy (permits `ehr.get_patient`) +- `fixtures/bundle-v2/`: identical except one character changed in a comment (`A` → `a` in `allow_ehr_tools.cedar` line 1) ## Running diff --git a/experiments/claim1-policy-hash-binding/run.py b/experiments/claim1-policy-hash-binding/run.py index 8254c32..1898a00 100644 --- a/experiments/claim1-policy-hash-binding/run.py +++ b/experiments/claim1-policy-hash-binding/run.py @@ -1,6 +1,6 @@ """ Experiment: Policy Bundle Hash Binding -Claim 1 — Hardware-attested policy enforcement at the AI agent tool boundary +Claim 1: Hardware-attested policy enforcement at the AI agent tool boundary Proves four properties: 1. Bundle hash is deterministic (same content → same hash, always) @@ -124,7 +124,7 @@ def result(label: str, value: str, ok: bool | None = None) -> None: def main() -> int: print("=" * 60) print("Experiment: Policy Bundle Hash Binding") - print("Claim 1 — cMCP TEE-measured policy enforcement") + print("Claim 1: cMCP TEE-measured policy enforcement") print("=" * 60) failures = 0 @@ -132,7 +132,7 @@ def main() -> int: # ------------------------------------------------------------------ # Property 1: Determinism # ------------------------------------------------------------------ - section("1. Hash determinism — same bundle, same hash across loads") + section("1. Hash determinism: same bundle, same hash across loads") b1_load1 = load_policy_bundle(str(BUNDLE_V1)) b1_load2 = load_policy_bundle(str(BUNDLE_V1)) @@ -150,7 +150,7 @@ def main() -> int: # ------------------------------------------------------------------ # Property 2: Avalanche effect # ------------------------------------------------------------------ - section("2. Avalanche effect — one character changed in cedar comment") + section("2. Avalanche effect: one character changed in cedar comment") b2_load = load_policy_bundle(str(BUNDLE_V2)) h2 = b2_load.bundle_hash @@ -168,14 +168,14 @@ def main() -> int: result("Change", f"line 1 of cedar file: {repr(cedar_v1)} -> {repr(cedar_v2)}") result("Bits changed (of 256)", f"{bits_diff} ({100 * bits_diff // 256}%)") result("Hex chars changed (of 64)", f"{chars_diff}") - result("Hashes differ", "YES — tamper detectable" if hashes_differ else "NO — NOT detectable", hashes_differ) + result("Hashes differ", "YES: tamper detectable" if hashes_differ else "NO: NOT detectable", hashes_differ) if not hashes_differ: failures += 1 # ------------------------------------------------------------------ # Property 3: PolicyHashMismatch on disk/expected mismatch # ------------------------------------------------------------------ - section("3. Tamper detection — load bundle-v2 with expected_hash of bundle-v1") + section("3. Tamper detection: load bundle-v2 with expected_hash of bundle-v1") print(f" (simulates an admin swapping the bundle after approval)") mismatch_raised = False @@ -186,7 +186,7 @@ def main() -> int: result("PolicyHashMismatch raised", "YES", True) result("Error detail", str(exc)[:80] + "...") if not mismatch_raised: - result("PolicyHashMismatch raised", "NO — bundle substitution NOT caught", False) + result("PolicyHashMismatch raised", "NO: bundle substitution NOT caught", False) failures += 1 # Positive control: correct hash passes @@ -239,7 +239,7 @@ def main() -> int: print(" - TRACE Claim signature is invalidated by any hash field change") return 0 else: - print(f"Result: {failures} PROPERTIES FAILED — see output above") + print(f"Result: {failures} PROPERTIES FAILED: see output above") return 1 diff --git a/experiments/claim2-false-positive-rate/README.md b/experiments/claim2-false-positive-rate/README.md index 845e3ec..2f20b07 100644 --- a/experiments/claim2-false-positive-rate/README.md +++ b/experiments/claim2-false-positive-rate/README.md @@ -11,7 +11,7 @@ The monotonic session sensitivity model blocks ALL external non-BAA calls once ` This experiment quantifies that cost across five representative BFSI/healthcare workflow patterns using labeled ground-truth traces. -**False positive (FP):** Session policy blocks an external non-BAA call where `phi_in_agent_context` is `false` — the agent demonstrably would not have transmitted PHI in this call. +**False positive (FP):** Session policy blocks an external non-BAA call where `phi_in_agent_context` is `false`: the agent demonstrably would not have transmitted PHI in this call. **False positive rate (FPR) = FP / (FP + TP_blocked)** @@ -84,6 +84,6 @@ If the agent SDK reports which prior call IDs are present in its context window ## Ground truth labeling -`phi_in_agent_context` in `fixtures/trace_corpus.json` is set by the experimenter, not computed by the gateway. It represents whether the agent's reasoning for a specific call references PHI from prior responses. This is the label the gateway *cannot* observe — which is exactly why the monotonic model exists. +`phi_in_agent_context` in `fixtures/trace_corpus.json` is set by the experimenter, not computed by the gateway. It represents whether the agent's reasoning for a specific call references PHI from prior responses. This is the label the gateway *cannot* observe: which is exactly why the monotonic model exists. A value of `false` means: if this call had been allowed, the agent would not have transmitted PHI content. The PHI exists in the session context window (it was retrieved in an earlier call), but the agent's decision and arguments for this specific call are independent of that PHI content. diff --git a/experiments/claim2-session-vs-call-policy/README.md b/experiments/claim2-session-vs-call-policy/README.md index ef71986..e3a4c3d 100644 --- a/experiments/claim2-session-vs-call-policy/README.md +++ b/experiments/claim2-session-vs-call-policy/README.md @@ -1,4 +1,4 @@ -# Experiment: Session-Level vs. Per-Call Policy — The Compliance Gap +# Experiment: Session-Level vs. Per-Call Policy: The Compliance Gap **Claim:** Monotonic session sensitivity state for LLM data governance (cMCP Claim 2) @@ -8,7 +8,7 @@ Individual call authorization is necessary but insufficient for cross-system com **Scenario:** -A clinical decision-support agent retrieves a patient record (PHI), then makes several downstream calls. Each downstream call is individually authorized — the agent has permission to call each of those tools. But after the PHI retrieval, the session context is contaminated: any external call the agent makes could carry PHI from its context window. Per-call policy cannot detect this; session-level policy blocks it. +A clinical decision-support agent retrieves a patient record (PHI), then makes several downstream calls. Each downstream call is individually authorized: the agent has permission to call each of those tools. But after the PHI retrieval, the session context is contaminated: any external call the agent makes could carry PHI from its context window. Per-call policy cannot detect this; session-level policy blocks it. **Call trace:** diff --git a/experiments/claim2-session-vs-call-policy/run.py b/experiments/claim2-session-vs-call-policy/run.py index faa6520..122a394 100644 --- a/experiments/claim2-session-vs-call-policy/run.py +++ b/experiments/claim2-session-vs-call-policy/run.py @@ -1,6 +1,6 @@ """ -Experiment: Session-Level vs. Per-Call Policy — The Compliance Gap -Claim 2 — Monotonic session sensitivity state for LLM data governance +Experiment: Session-Level vs. Per-Call Policy: The Compliance Gap +Claim 2: Monotonic session sensitivity state for LLM data governance Constructs a synthetic 5-call agent session with PHI contamination. Shows which cross-boundary violations per-call policy misses that session @@ -79,7 +79,7 @@ class SyntheticCall: tool_name="slack.post_message", args={"channel": "#clinical-alerts", "message": "Patient summary ready for review."}, response={"ok": True, "ts": "1750000000.000001"}, - payload_contains_phi=False, # payload itself is clean — per-call sees nothing + payload_contains_phi=False, # payload itself is clean: per-call sees nothing ), SyntheticCall( tool_name="analytics.run_query", @@ -115,7 +115,7 @@ class SyntheticCall: # - If the outbound arguments contain an explicit PHI pattern, deny. # - Otherwise allow. # -# This is intentionally the BEST CASE per-call model — it even inspects +# This is intentionally the BEST CASE per-call model: it even inspects # the outbound arguments for PHI. It still misses cross-boundary violations # because it cannot see what is in the agent's context window. # --------------------------------------------------------------------------- @@ -182,8 +182,8 @@ def session_policy( def main() -> int: print("=" * 72) - print("Experiment: Session-Level vs. Per-Call Policy — The Compliance Gap") - print("Claim 2 — cMCP monotonic session sensitivity state") + print("Experiment: Session-Level vs. Per-Call Policy: The Compliance Gap") + print("Claim 2: cMCP monotonic session sensitivity state") print("=" * 72) catalog = load_catalog(str(FIXTURES / "catalog.json")) @@ -220,7 +220,7 @@ def main() -> int: # Run per-call policy (before session state is updated) pc_verdict, pc_reason = per_call_policy(call.tool_name, entry, call) - # Run session policy (before session state is updated — pre-call check) + # Run session policy (before session state is updated: pre-call check) sess_verdict, sess_reason = session_policy(call.tool_name, entry, session.max_sensitivity) # Determine if this is a true violation diff --git a/experiments/claim3-rug-pull-detection/README.md b/experiments/claim3-rug-pull-detection/README.md index c9c7eeb..86e7720 100644 --- a/experiments/claim3-rug-pull-detection/README.md +++ b/experiments/claim3-rug-pull-detection/README.md @@ -7,16 +7,16 @@ ## What this measures -MCP servers can modify tool descriptions after the enterprise security team completed its review — a rug-pull attack. The gateway pins cryptographic hashes of approved definitions at startup inside the TEE and re-hashes on every `tools/list_changed` notification. Any character-level change to a description changes the hash and blocks the tool. +MCP servers can modify tool descriptions after the enterprise security team completed its review: a rug-pull attack. The gateway pins cryptographic hashes of approved definitions at startup inside the TEE and re-hashes on every `tools/list_changed` notification. Any character-level change to a description changes the hash and blocks the tool. This experiment verifies four properties without requiring a live MCP server: | Property | Claim | |---|---| -| P1 — Determinism | Same definition → same hash, always | -| P2 — Avalanche | One sentence added → 48% bit difference (SHA-256 avalanche) | -| P3 — Aggregate binding | Catalog-level hash changes when any single definition changes | -| P4 — Fail-closed | `CatalogHashMismatch` raised; gateway blocks the tool | +| P1: Determinism | Same definition → same hash, always | +| P2: Avalanche | One sentence added → 48% bit difference (SHA-256 avalanche) | +| P3: Aggregate binding | Catalog-level hash changes when any single definition changes | +| P4: Fail-closed | `CatalogHashMismatch` raised; gateway blocks the tool | --- diff --git a/experiments/claim4-trace-claim-nonce/README.md b/experiments/claim4-trace-claim-nonce/README.md index 4ba4511..2f35819 100644 --- a/experiments/claim4-trace-claim-nonce/README.md +++ b/experiments/claim4-trace-claim-nonce/README.md @@ -21,12 +21,12 @@ each enclave instance produces a distinct, fresh nonce. The session is bound thr | Property | Claim | |---|---| -| P1 — Thumbprint determinism | Same key → same thumbprint, re-derivable from `cnf.jwk.x` | -| P2 — Key binding | `report_data[:32]` equals the thumbprint | -| P3 — Instance binding | Different TEE key → different thumbprint | -| P4 — Freshness | Different salt → different nonce across startups | -| P5 — Session binding | Replacing `session_id` in a signed claim breaks the Ed25519 signature | -| P6 — Selective disclosure resistance | Removing one audit entry changes `bundle_hash`; export signature fails | +| P1: Thumbprint determinism | Same key → same thumbprint, re-derivable from `cnf.jwk.x` | +| P2: Key binding | `report_data[:32]` equals the thumbprint | +| P3: Instance binding | Different TEE key → different thumbprint | +| P4: Freshness | Different salt → different nonce across startups | +| P5: Session binding | Replacing `session_id` in a signed claim breaks the Ed25519 signature | +| P6: Selective disclosure resistance | Removing one audit entry changes `bundle_hash`; export signature fails | --- diff --git a/experiments/claim5-temporal-adjacency/README.md b/experiments/claim5-temporal-adjacency/README.md index c5afdbc..efcd7b8 100644 --- a/experiments/claim5-temporal-adjacency/README.md +++ b/experiments/claim5-temporal-adjacency/README.md @@ -11,12 +11,12 @@ At the MCP transport boundary, a gateway cannot observe whether an LLM agent inc | Property | What it proves | |---|---| -| P1 — Sequential recording | Calls recorded with monotonic sequence numbers | -| P2 — Cross-boundary detection | Transitions from high-sensitivity domains recorded in graph | -| P3 — Provenance disclaimer | `edges_represent` field explicitly qualifies adjacency vs. provenance | -| P4 — No false negatives | Any PHI-relevant subsequent call has seq > PHI call seq; edge implicit | -| P5 — Concurrent calls | Simultaneous calls both adjacent to prior PHI call | -| P6 — Denied calls in graph | Agent's request is evidence of awareness, regardless of response delivery | +| P1: Sequential recording | Calls recorded with monotonic sequence numbers | +| P2: Cross-boundary detection | Transitions from high-sensitivity domains recorded in graph | +| P3: Provenance disclaimer | `edges_represent` field explicitly qualifies adjacency vs. provenance | +| P4: No false negatives | Any PHI-relevant subsequent call has seq > PHI call seq; edge implicit | +| P5: Concurrent calls | Simultaneous calls both adjacent to prior PHI call | +| P6: Denied calls in graph | Agent's request is evidence of awareness, regardless of response delivery | --- @@ -31,7 +31,7 @@ python experiments/claim5-temporal-adjacency/run.py ## Relationship to Claim 2 FPR -The Claim 2 false positive rate experiment (`experiments/claim2-false-positive-rate/`) measures the operational cost of the monotonic model — what fraction of blocked external calls are unnecessary. That experiment and this one are two sides of the same coin: this experiment proves no false negatives; the FPR experiment measures the false positive rate empirically. +The Claim 2 false positive rate experiment (`experiments/claim2-false-positive-rate/`) measures the operational cost of the monotonic model: what fraction of blocked external calls are unnecessary. That experiment and this one are two sides of the same coin: this experiment proves no false negatives; the FPR experiment measures the false positive rate empirically. --- diff --git a/experiments/claim6-cross-org-attestation/README.md b/experiments/claim6-cross-org-attestation/README.md index 15a48a5..e5ee7b6 100644 --- a/experiments/claim6-cross-org-attestation/README.md +++ b/experiments/claim6-cross-org-attestation/README.md @@ -12,13 +12,13 @@ In B2B AI tool access, enterprise (party A) uses a Phase 1 cMCP gateway and SaaS | Property | What it proves | |---|---| -| P1 — Independent keys | Gateway and server have different TEE keypairs | -| P2 — Session linkage | Both claims carry the same session_id | -| P3 — Phase 1 nonce | SHA-256(gateway_key ∥ session_id) binds Phase 1 to session | -| P4 — Phase 2 nonce | SHA-256(server_key ∥ session_id) binds Phase 2 to session | -| P5 — Independent verify | Each claim verifiable against its own public key | -| P6 — Tamper independence | Phase 1 tamper invalidates only Phase 1; Phase 2 unaffected | -| P7 — Binary swap detection | Different server binary → different measurement → verifier rejects | +| P1: Independent keys | Gateway and server have different TEE keypairs | +| P2: Session linkage | Both claims carry the same session_id | +| P3: Phase 1 nonce | SHA-256(gateway_key ∥ session_id) binds Phase 1 to session | +| P4: Phase 2 nonce | SHA-256(server_key ∥ session_id) binds Phase 2 to session | +| P5: Independent verify | Each claim verifiable against its own public key | +| P6: Tamper independence | Phase 1 tamper invalidates only Phase 1; Phase 2 unaffected | +| P7: Binary swap detection | Different server binary → different measurement → verifier rejects | --- diff --git a/governance/cmcp-enforcement.yaml b/governance/cmcp-enforcement.yaml index 7197a31..f2ddf4a 100644 --- a/governance/cmcp-enforcement.yaml +++ b/governance/cmcp-enforcement.yaml @@ -9,7 +9,7 @@ version: "1.0" description: > cMCP enforces default-deny Cedar policy on all MCP tool calls at the gateway boundary inside a TEE. The Cedar policy bundle is measured into hardware at - startup — any tool call not matching an explicit permit rule is denied without + startup, any tool call not matching an explicit permit rule is denied without any code path the operator can override at runtime. enforcement_model: cedar @@ -20,7 +20,7 @@ hardware_roots: - tpm2 owasp_coverage: - ASI-01: PromptInjectionDetector — blocks injected instructions in tool responses - ASI-02: catalog + Cedar permit rules — enforces approved tool list - ASI-03: Cedar scope boundary — agent cannot call tools outside explicit permit - ASI-06: hash-chained audit inside TEE — tamper-evident call log per session + ASI-01: PromptInjectionDetector, blocks injected instructions in tool responses + ASI-02: catalog + Cedar permit rules, enforces approved tool list + ASI-03: Cedar scope boundary, agent cannot call tools outside explicit permit + ASI-06: hash-chained audit inside TEE, tamper-evident call log per session diff --git a/mkdocs.yml b/mkdocs.yml index a8a75b3..96eaf18 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -1,5 +1,5 @@ site_name: cMCP -site_description: Confidential MCP Runtime — hardware-attested policy enforcement for MCP tool calls +site_description: Confidential MCP Runtime, hardware-attested policy enforcement for MCP tool calls site_url: https://cmcp.agentrust-io.com repo_url: https://github.com/agentrust-io/cmcp repo_name: agentrust-io/cmcp diff --git a/pyproject.toml b/pyproject.toml index ecace47..ab42211 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -5,7 +5,7 @@ build-backend = "hatchling.build" [project] name = "cmcp-runtime" version = "0.2.1" -description = "Hardware-attested MCP runtime — TEE-enforced policy and TRACE Claim generation" +description = "Hardware-attested MCP runtime, TEE-enforced policy and TRACE Claim generation" readme = "README.md" license = { text = "MIT" } authors = [ diff --git a/scripts/deploy-azure.sh b/scripts/deploy-azure.sh index 99c35e2..35f857c 100644 --- a/scripts/deploy-azure.sh +++ b/scripts/deploy-azure.sh @@ -1,5 +1,5 @@ #!/usr/bin/env bash -# deploy-azure.sh — provision an Azure Confidential VM and install cMCP +# deploy-azure.sh: provision an Azure Confidential VM and install cMCP # # Usage: # ./scripts/deploy-azure.sh [sev-snp|tdx] (default: sev-snp) @@ -30,7 +30,7 @@ case "$TEE_TYPE" in ;; esac -# Default location — Confidential VM availability varies by region. +# Default location: Confidential VM availability varies by region. # Verify the SKU is available before proceeding: # az vm list-skus --location --size "$VM_SIZE" --output table LOCATION="${AZURE_LOCATION:-eastus}" diff --git a/scripts/deploy-gcp.sh b/scripts/deploy-gcp.sh index 2c71c41..c55510b 100644 --- a/scripts/deploy-gcp.sh +++ b/scripts/deploy-gcp.sh @@ -1,5 +1,5 @@ #!/usr/bin/env bash -# deploy-gcp.sh — provision a GCP Confidential VM and install cMCP +# deploy-gcp.sh: provision a GCP Confidential VM and install cMCP # # Usage: # ./scripts/deploy-gcp.sh [tdx|sev-snp] (default: tdx) diff --git a/src/cmcp_runtime/__init__.py b/src/cmcp_runtime/__init__.py index 91a1a86..2d96b9f 100644 --- a/src/cmcp_runtime/__init__.py +++ b/src/cmcp_runtime/__init__.py @@ -1,3 +1,3 @@ -"""cMCP Runtime — hardware-attested MCP runtime.""" +"""cMCP Runtime: hardware-attested MCP runtime.""" __version__ = "0.1.0" diff --git a/src/cmcp_runtime/audit/__init__.py b/src/cmcp_runtime/audit/__init__.py index 4f61491..4f73d92 100644 --- a/src/cmcp_runtime/audit/__init__.py +++ b/src/cmcp_runtime/audit/__init__.py @@ -1 +1 @@ -"""Audit package — implemented in subsequent issues.""" +"""Audit package: implemented in subsequent issues.""" diff --git a/src/cmcp_runtime/audit/keys.py b/src/cmcp_runtime/audit/keys.py index 191b558..b6c0a8a 100644 --- a/src/cmcp_runtime/audit/keys.py +++ b/src/cmcp_runtime/audit/keys.py @@ -1,4 +1,4 @@ -"""Ed25519 signing key management — implements issue #46.""" +"""Ed25519 signing key management: implements issue #46.""" from __future__ import annotations @@ -32,7 +32,7 @@ def __init__(self) -> None: @property def public_key_hex(self) -> str: - """32-byte Ed25519 public key, hex-encoded — included in every TRACE Claim.""" + """32-byte Ed25519 public key, hex-encoded: included in every TRACE Claim.""" return self._public_bytes.hex() @property diff --git a/src/cmcp_runtime/catalog/__init__.py b/src/cmcp_runtime/catalog/__init__.py index e731254..49fe860 100644 --- a/src/cmcp_runtime/catalog/__init__.py +++ b/src/cmcp_runtime/catalog/__init__.py @@ -1 +1 @@ -"""Catalog package — implemented in subsequent issues.""" +"""Catalog package: implemented in subsequent issues.""" diff --git a/src/cmcp_runtime/catalog/loader.py b/src/cmcp_runtime/catalog/loader.py index a633dd3..7ae4baf 100644 --- a/src/cmcp_runtime/catalog/loader.py +++ b/src/cmcp_runtime/catalog/loader.py @@ -1,4 +1,4 @@ -"""Tool catalog loading, hash verification, and identity binding — implements #86, #88.""" +"""Tool catalog loading, hash verification, and identity binding: implements #86, #88.""" from __future__ import annotations @@ -170,7 +170,7 @@ def load_catalog(catalog_path: str, expected_hash: str | None = None) -> ToolCat ) if tool_name in entries: raise CatalogToolNameCollision( - f"Duplicate tool_name '{tool_name}' — gateway will not start", + f"Duplicate tool_name '{tool_name}': gateway will not start", detail="Each tool_name must map to exactly one upstream server", ) @@ -227,7 +227,7 @@ def load_catalog(catalog_path: str, expected_hash: str | None = None) -> ToolCat actual_hex = computed_hash.removeprefix("sha256:") if expected_hex != actual_hex: raise CatalogHashMismatch( - "Tool catalog hash mismatch — gateway will not start", + "Tool catalog hash mismatch: gateway will not start", detail=f"expected=sha256:{expected_hex} actual={computed_hash}", ) diff --git a/src/cmcp_runtime/catalog/scanner.py b/src/cmcp_runtime/catalog/scanner.py index aeeec57..9f482b0 100644 --- a/src/cmcp_runtime/catalog/scanner.py +++ b/src/cmcp_runtime/catalog/scanner.py @@ -1,5 +1,5 @@ """ -Tool catalog security scanning via AGT MCPSecurityScanner — implements issue #58. +Tool catalog security scanning via AGT MCPSecurityScanner: implements issue #58. AGT's MCPSecurityScanner provides: - SHA-256 tool fingerprinting (detects definition mutation / rug-pull P4.2) @@ -65,13 +65,13 @@ def __init__(self) -> None: self._available = True logger.info("CatalogScanner: AGT MCPSecurityScanner active") except Exception as exc: - logger.warning("CatalogScanner: AGT init failed (%s) — running without security scan", exc) + logger.warning("CatalogScanner: AGT init failed (%s): running without security scan", exc) self._scanner = None self._available = False else: self._scanner = None self._available = False - logger.info("CatalogScanner: agent-os-kernel not installed — no catalog scanning") + logger.info("CatalogScanner: agent-os-kernel not installed: no catalog scanning") def scan_catalog(self, catalog: ToolCatalog) -> CatalogScanResult: """ diff --git a/src/cmcp_runtime/inspection/__init__.py b/src/cmcp_runtime/inspection/__init__.py index 97fe451..2fcb97a 100644 --- a/src/cmcp_runtime/inspection/__init__.py +++ b/src/cmcp_runtime/inspection/__init__.py @@ -1 +1 @@ -"""Inspection package — implemented in subsequent issues.""" +"""Inspection package: implemented in subsequent issues.""" diff --git a/src/cmcp_runtime/inspection/pipeline.py b/src/cmcp_runtime/inspection/pipeline.py index c588dd9..afe8d9d 100644 --- a/src/cmcp_runtime/inspection/pipeline.py +++ b/src/cmcp_runtime/inspection/pipeline.py @@ -1,5 +1,5 @@ """ -Response inspection pipeline — implements issues #61, #65, #81. +Response inspection pipeline: implements issues #61, #65, #81. Stage 4 (injection detection) and Stage 3 (sensitivity classification) now delegate to AGT components where available: @@ -27,7 +27,7 @@ _log = logging.getLogger(__name__) -# ── AGT components (optional — fall back gracefully) ───────────────────────── +# ── AGT components (optional: fall back gracefully) ───────────────────────── try: from agent_os.credential_redactor import CredentialRedactor from agent_os.mcp_response_scanner import MCPResponseScanner as AGTResponseScanner @@ -36,7 +36,7 @@ except ImportError: _AGT_AVAILABLE = False -# Mirrors agent_os._SENSITIVITY_THRESHOLDS — update if the package changes. +# Mirrors agent_os._SENSITIVITY_THRESHOLDS: update if the package changes. _INJECTION_THRESHOLDS: dict[str, float] = {"strict": 0.3, "balanced": 0.5, "permissive": 0.7} # ── Fallback injection patterns (used when AGT not available) ───────────────── @@ -142,7 +142,7 @@ def _stage2_schema_validation( mode = catalog_entry.schema_validation_mode if not surplus: - # No surplus fields — still run jsonschema for type/required violations + # No surplus fields: still run jsonschema for type/required violations try: jsonschema.validate(payload, output_schema) except jsonschema.ValidationError as exc: @@ -156,7 +156,7 @@ def _stage2_schema_validation( ) return StageResult(stage="schema", decision="allow"), response_bytes - # Surplus fields present — mode determines action + # Surplus fields present: mode determines action if mode == "strict": return ( StageResult( @@ -286,7 +286,7 @@ def _classify_sensitivity( """ Stage 3: derive sensitivity tags from three sources (applied in order): - 1. catalog_entry.sensitivity_level — always applied + 1. catalog_entry.sensitivity_level: always applied 2. field-level x-sensitivity annotations in output_schema properties 3. pattern matching on response content (AGT CredentialRedactor or regex fallback) """ @@ -334,7 +334,7 @@ def _classify_sensitivity( class SensitivityClassificationStage: """ - Stage 3 of the InspectionPipeline — sensitivity classification. + Stage 3 of the InspectionPipeline: sensitivity classification. Applies three classification sources in order: 1. catalog_entry.sensitivity_level annotation @@ -369,7 +369,7 @@ class InspectionPipeline: """ 4-stage response inspection pipeline. - All stages run even when an earlier stage would deny — this produces a + All stages run even when an earlier stage would deny: this produces a complete audit record. Final decision = deny if ANY stage returns deny. After completing all stages, calls session.update_from_inspection() to @@ -440,7 +440,7 @@ def run( if s2.stripped_fields: stripped_fields = s2.stripped_fields if s2.decision == "allow" and s2.stripped_fields and s2.reason == "surplus fields redacted": - # Redact mode modified the bytes — expose to caller + # Redact mode modified the bytes: expose to caller modified_response = response_bytes # Stage 3: sensitivity classification (AGT CredentialRedactor + catalog) @@ -457,7 +457,7 @@ def run( stage_results["classification"] = "allow" # Stage 4: injection detection (AGT PromptInjectionDetector + MCPResponseScanner) - # INJECT-005: scan bytes decoded strictly — non-UTF-8 is treated as a deny to + # INJECT-005: scan bytes decoded strictly: non-UTF-8 is treated as a deny to # prevent bypass via invalid byte sequences that errors="replace" would corrupt. try: response_text = response_bytes.decode("utf-8") @@ -512,7 +512,7 @@ def run( stage_results["injection"] = "deny" agt_mcp_denied = True except concurrent.futures.TimeoutError: - # INJECT-002: scanner timed out — deny to prevent bypass via slow AGT + # INJECT-002: scanner timed out: deny to prevent bypass via slow AGT deny_reasons.append(f"AGT MCPResponseScanner timed out after {self._scanner_timeout}s") injection_pattern = "scanner_timeout" injection_scanner = "timeout" @@ -557,7 +557,7 @@ def _run_s4() -> StageResult: deny_reasons = list(dict.fromkeys(deny_reasons)) final = "deny" if deny_reasons else "allow" - # Handoff to session state — happens even for denied responses + # Handoff to session state: happens even for denied responses # (a denied high-sensitivity response still raises session sensitivity) # INJECT-004: injection_detected must reflect both scanners, not only s4. injection_detected = s4.decision == "deny" or agt_mcp_denied diff --git a/src/cmcp_runtime/kill_switch.py b/src/cmcp_runtime/kill_switch.py index 0f37810..88f7fa2 100644 --- a/src/cmcp_runtime/kill_switch.py +++ b/src/cmcp_runtime/kill_switch.py @@ -1,4 +1,4 @@ -"""AGT SRE kill switch evaluator — implements issue #341.""" +"""AGT SRE kill switch evaluator: implements issue #341.""" from __future__ import annotations @@ -16,7 +16,7 @@ class KillSwitchEvaluator: When a registered agent identity exceeds `deny_rate_threshold` policy denies over the rolling `window_seconds` window (with at least `min_calls` events), the identity is flagged. The TRACE claim for the session that - trips the threshold carries `kill_switch_triggered=true` — hardware-attested + trips the threshold carries `kill_switch_triggered=true`: hardware-attested evidence of automated enforcement. Subsequent `create_session()` calls for the same agent identity raise `KillSwitchTripped`. diff --git a/src/cmcp_runtime/mcp/__init__.py b/src/cmcp_runtime/mcp/__init__.py index 98aeb51..1d81d24 100644 --- a/src/cmcp_runtime/mcp/__init__.py +++ b/src/cmcp_runtime/mcp/__init__.py @@ -1 +1 @@ -"""Mcp package — implemented in subsequent issues.""" +"""Mcp package: implemented in subsequent issues.""" diff --git a/src/cmcp_runtime/policy/__init__.py b/src/cmcp_runtime/policy/__init__.py index dc40733..06a97c6 100644 --- a/src/cmcp_runtime/policy/__init__.py +++ b/src/cmcp_runtime/policy/__init__.py @@ -1 +1 @@ -"""Policy package — implemented in subsequent issues.""" +"""Policy package: implemented in subsequent issues.""" diff --git a/src/cmcp_runtime/policy/bundle.py b/src/cmcp_runtime/policy/bundle.py index e0def1c..8231406 100644 --- a/src/cmcp_runtime/policy/bundle.py +++ b/src/cmcp_runtime/policy/bundle.py @@ -1,4 +1,4 @@ -"""Cedar policy bundle loading and hash verification — implements issue #63.""" +"""Cedar policy bundle loading and hash verification: implements issue #63.""" from __future__ import annotations @@ -43,7 +43,7 @@ class PolicyBundle: manifest: PolicyManifest policy_files: dict[str, str] # filename → file content schema_content: str - bundle_hash: str # sha256: — what gets measured into the TEE report + bundle_hash: str # sha256:: what gets measured into the TEE report def _sha256_hex(data: bytes) -> str: @@ -90,7 +90,7 @@ def load_policy_bundle(bundle_path: str, expected_hash: str | None = None) -> Po - *.cedar (Cedar policy files) - schema.cedarschema (Cedar schema) - expected_hash is "sha256:" — must match the computed bundle hash. + expected_hash is "sha256:": must match the computed bundle hash. If expected_hash is None, the hash is computed but not verified (dev convenience). Raises PolicyHashMismatch if hashes do not match. @@ -120,7 +120,7 @@ def load_policy_bundle(bundle_path: str, expected_hash: str | None = None) -> Po # gateway cannot know in advance whether a newer agent_os is semantically # compatible. Operators must review changelogs and re-pin after upgrade. logger.warning( - "POLICY-007: agent_os_version mismatch — bundle pinned %s, installed %s. " + "POLICY-007: agent_os_version mismatch: bundle pinned %s, installed %s. " "Cedar policy semantics may have changed; review the agent-os-kernel changelog.", pinned_agent_os, _AGENT_OS_VERSION, @@ -164,7 +164,7 @@ def load_policy_bundle(bundle_path: str, expected_hash: str | None = None) -> Po expected_hex = expected_hash.removeprefix("sha256:") if computed != expected_hex: raise PolicyHashMismatch( - "Policy bundle hash mismatch — gateway will not start", + "Policy bundle hash mismatch: gateway will not start", detail=f"expected=sha256:{expected_hex} actual=sha256:{computed}", ) diff --git a/src/cmcp_runtime/session/__init__.py b/src/cmcp_runtime/session/__init__.py index a4fed1a..b304d30 100644 --- a/src/cmcp_runtime/session/__init__.py +++ b/src/cmcp_runtime/session/__init__.py @@ -1 +1 @@ -"""Session package — implemented in subsequent issues.""" +"""Session package: implemented in subsequent issues.""" diff --git a/src/cmcp_runtime/session/call_log.py b/src/cmcp_runtime/session/call_log.py index 4342252..b3b296f 100644 --- a/src/cmcp_runtime/session/call_log.py +++ b/src/cmcp_runtime/session/call_log.py @@ -1,4 +1,4 @@ -"""Per-session call log and temporal adjacency tracking — implements issue #94.""" +"""Per-session call log and temporal adjacency tracking: implements issue #94.""" from __future__ import annotations @@ -47,7 +47,7 @@ def adjacent_pairs(self) -> list[tuple[str, str]]: def suspicious_sequence(self, threshold: int = 3) -> bool: """ Return True if the same tool was called more than `threshold` times - consecutively — a potential injection/replay pattern. + consecutively: a potential injection/replay pattern. """ if not self.records: return False @@ -73,7 +73,7 @@ def consecutive_count(self, tool_name: str) -> int: # --------------------------------------------------------------------------- -# SessionCallLog and CallLogEntry — richer per-call tracking for TRACE Claims +# SessionCallLog and CallLogEntry: richer per-call tracking for TRACE Claims # --------------------------------------------------------------------------- #: Sentinel for the compliance_domain of calls where no catalog entry exists. @@ -88,7 +88,7 @@ def consecutive_count(self, tool_name: str) -> int: class CallLogEntry: """One entry in the per-session call log (issue #94 spec). - Tracks temporal adjacency only — not data provenance. Edges in the + Tracks temporal adjacency only: not data provenance. Edges in the call graph represent "B was called immediately after A", not "B consumed A's output". """ diff --git a/src/cmcp_runtime/session/state.py b/src/cmcp_runtime/session/state.py index 01537b2..3032a87 100644 --- a/src/cmcp_runtime/session/state.py +++ b/src/cmcp_runtime/session/state.py @@ -1,4 +1,4 @@ -"""Session sensitivity state machine — implements issue #84.""" +"""Session sensitivity state machine: implements issue #84.""" from __future__ import annotations @@ -7,7 +7,7 @@ from datetime import UTC, datetime from uuid import uuid4 -# Sensitivity level ordering — monotonically increasing only. +# Sensitivity level ordering: monotonically increasing only. # hipaa_phi, mnpi, trade_secret are all at level 3 (equal highest). SENSITIVITY_ORDER: dict[str, int] = { "public": 0, @@ -37,7 +37,7 @@ class SessionState: """ Per-session sensitivity state machine. - State transitions are monotonically increasing — sensitivity can only rise, + State transitions are monotonically increasing: sensitivity can only rise, never fall automatically. A session reset (operator-only, issue #92) is the only way to lower sensitivity. @@ -65,7 +65,7 @@ def update_from_inspection( call_id: str, sensitivity_tags: list[str], injection_detected: bool, - response_allowed: bool, # noqa: ARG002 — logged for future use + response_allowed: bool, # noqa: ARG002 (logged for future use) ) -> None: """ Update session state from an inspection result. @@ -110,7 +110,7 @@ def upgrade_attestation(self) -> tuple[str, str]: """ Rotate the session token when attestation upgrades (e.g. software-only → hardware TEE). - Unlike reset(), session sensitivity state is preserved — the ongoing session + Unlike reset(), session sensitivity state is preserved: the ongoing session continues at its current sensitivity level. Only the session_id is rotated so that any trust assertions cached against the old ID are invalidated. diff --git a/src/cmcp_runtime/startup.py b/src/cmcp_runtime/startup.py index 7003ae0..af72b55 100644 --- a/src/cmcp_runtime/startup.py +++ b/src/cmcp_runtime/startup.py @@ -66,7 +66,7 @@ class RuntimeContext: def _jwk_thumbprint_sha256(x_b64url: str) -> bytes: - """RFC 7638 §3 JWK Thumbprint — SHA-256(UTF-8(JSON of sorted required OKP members)).""" + """RFC 7638 §3 JWK Thumbprint: SHA-256(UTF-8(JSON of sorted required OKP members)).""" canonical = json.dumps( {"crv": "Ed25519", "kty": "OKP", "x": x_b64url}, separators=(",", ":"), diff --git a/src/cmcp_runtime/tee/__init__.py b/src/cmcp_runtime/tee/__init__.py index 8305a99..cecd59e 100644 --- a/src/cmcp_runtime/tee/__init__.py +++ b/src/cmcp_runtime/tee/__init__.py @@ -1 +1 @@ -"""TEE provider abstraction — implemented in subsequent issues.""" +"""TEE provider abstraction: implemented in subsequent issues.""" diff --git a/src/cmcp_runtime/tee/base.py b/src/cmcp_runtime/tee/base.py index 0f68619..e6e10e1 100644 --- a/src/cmcp_runtime/tee/base.py +++ b/src/cmcp_runtime/tee/base.py @@ -1,4 +1,4 @@ -"""TEE provider abstraction — implements issue #77.""" +"""TEE provider abstraction: implements issue #77.""" from __future__ import annotations @@ -116,5 +116,5 @@ def get_attestation_report(self, nonce: bytes) -> AttestationReport: raw_evidence=None, attestation_generated_at=datetime.now(tz=UTC), attestation_validity_seconds=86400, - measurement_note="software-only mode — not hardware-backed", + measurement_note="software-only mode: not hardware-backed", ) diff --git a/src/cmcp_runtime/tee/detect.py b/src/cmcp_runtime/tee/detect.py index fa266fc..d0cb626 100644 --- a/src/cmcp_runtime/tee/detect.py +++ b/src/cmcp_runtime/tee/detect.py @@ -1,4 +1,4 @@ -"""TEE provider detection loop — implements issue #72 (dev mode) and #77 (abstraction).""" +"""TEE provider detection loop: implements issue #72 (dev mode) and #77 (abstraction).""" from __future__ import annotations @@ -69,7 +69,7 @@ def detect_provider(config: Config) -> TEEProvider: "provider=software-only requires CMCP_DEV_MODE=1" ) logger.warning( - "Running in development mode — attestation is not hardware-backed. " + "Running in development mode: attestation is not hardware-backed. " "TRACE Claims produced here must not be used for compliance purposes." ) return SoftwareOnlyProvider() @@ -92,7 +92,7 @@ def detect_provider(config: Config) -> TEEProvider: # No hardware provider found if dev_mode: logger.warning( - "No hardware TEE detected. Running in development mode — " + "No hardware TEE detected. Running in development mode: " "attestation is not hardware-backed. " "TRACE Claims produced here must not be used for compliance purposes." ) diff --git a/src/cmcp_runtime/tee/opaque.py b/src/cmcp_runtime/tee/opaque.py index 2da5eed..2dd3534 100644 --- a/src/cmcp_runtime/tee/opaque.py +++ b/src/cmcp_runtime/tee/opaque.py @@ -1,4 +1,4 @@ -"""Opaque Systems TEE provider stub — not yet implemented.""" +"""Opaque Systems TEE provider stub: not yet implemented.""" from __future__ import annotations diff --git a/src/cmcp_runtime/tee/spiffe.py b/src/cmcp_runtime/tee/spiffe.py index 6bd47de..03ff680 100644 --- a/src/cmcp_runtime/tee/spiffe.py +++ b/src/cmcp_runtime/tee/spiffe.py @@ -1,5 +1,5 @@ """ -SPIFFE/SPIRE Workload API client — implements issue #96. +SPIFFE/SPIRE Workload API client: implements issue #96. Fetches X.509 SVIDs from a local SPIRE agent after TEE attestation succeeds. If SPIRE is not present or pyspiffe is not installed, falls back to @@ -23,7 +23,7 @@ logger = logging.getLogger(__name__) -_DEFAULT_SOCKET = "/tmp/spire-agent/public/api.sock" # nosec B108 — SPIRE Workload API standard socket path, not a temp file +_DEFAULT_SOCKET = "/tmp/spire-agent/public/api.sock" # nosec B108: SPIRE Workload API standard socket path, not a temp file _SPIRE_SOCKET_ENV = "CMCP_SPIRE_SOCKET" # Maximum time to wait for SPIRE agent to respond (seconds) @@ -152,12 +152,12 @@ def fetch_svid(socket_path: str | None = None) -> SpiffeClientResult: ) elif result.available: logger.warning( - "SPIRE agent reachable but SVID fetch failed: %s — falling back to self-signed TLS", + "SPIRE agent reachable but SVID fetch failed: %s: falling back to self-signed TLS", result.failure_reason, ) else: logger.warning( - "SPIRE not available (%s) — falling back to self-signed TLS", + "SPIRE not available (%s): falling back to self-signed TLS", result.failure_reason, ) diff --git a/src/cmcp_runtime/tee/tpm.py b/src/cmcp_runtime/tee/tpm.py index 871ed14..a63a054 100644 --- a/src/cmcp_runtime/tee/tpm.py +++ b/src/cmcp_runtime/tee/tpm.py @@ -1,4 +1,4 @@ -"""TPM 2.0 TEE provider — implements issue #83.""" +"""TPM 2.0 TEE provider: implements issue #83.""" from __future__ import annotations diff --git a/src/cmcp_verify/verify.py b/src/cmcp_verify/verify.py index 1b422c7..3048ea1 100644 --- a/src/cmcp_verify/verify.py +++ b/src/cmcp_verify/verify.py @@ -30,7 +30,7 @@ def _jwk_thumbprint_sha256(x_b64url: str) -> bytes: - """RFC 7638 §3 JWK Thumbprint — SHA-256(UTF-8(JSON of sorted required OKP members)).""" + """RFC 7638 §3 JWK Thumbprint: SHA-256(UTF-8(JSON of sorted required OKP members)).""" canonical = json.dumps( {"crv": "Ed25519", "kty": "OKP", "x": x_b64url}, separators=(",", ":"), diff --git a/tests/conformance/test_gateway_conformance.py b/tests/conformance/test_gateway_conformance.py index 35c1075..e445722 100644 --- a/tests/conformance/test_gateway_conformance.py +++ b/tests/conformance/test_gateway_conformance.py @@ -1,5 +1,5 @@ """ -Conformance test suite for the cMCP runtime — 22-case GTC Berlin demo spec. +Conformance test suite for the cMCP runtime: 22-case GTC Berlin demo spec. Uses starlette.testclient.TestClient (synchronous) against an in-process MCPServer so no asyncio/anyio machinery is needed in test code. @@ -110,7 +110,7 @@ def _call_side_effect(call_id: str, tool_name: str, arguments: dict, **kwargs) - @pytest.fixture() def client() -> TestClient: - """Minimal MCPServer with a single mock_tool — used by most conformance tests.""" + """Minimal MCPServer with a single mock_tool: used by most conformance tests.""" proxy = _make_proxy(allowed=True) server = MCPServer(proxy=proxy) return TestClient(server.app) @@ -249,7 +249,7 @@ def test_tools_list_via_mcp(client: TestClient) -> None: def test_trace_claim_not_found_returns_404(client: TestClient) -> None: """GET /sessions/nonexistent/trace-claim returns 404 when session_manager configured.""" - # client fixture has no session_manager — returns 501 + # client fixture has no session_manager: returns 501 # We need a client with a session_manager that returns None for unknown sessions. session_mgr = MagicMock() session_mgr.get_trace_claim.return_value = None diff --git a/tests/soak/reference_server.py b/tests/soak/reference_server.py index 722ea8b..09bfc24 100644 --- a/tests/soak/reference_server.py +++ b/tests/soak/reference_server.py @@ -1,5 +1,5 @@ """ -Reference MCP server for soak testing — exposes echo, get_data, and delay tools. +Reference MCP server for soak testing: exposes echo, get_data, and delay tools. Runs as a standalone Starlette/uvicorn process or in-process via TestClient. """ diff --git a/tests/unit/test_audit_chain_anchor.py b/tests/unit/test_audit_chain_anchor.py index 6c6e6a8..64d2549 100644 --- a/tests/unit/test_audit_chain_anchor.py +++ b/tests/unit/test_audit_chain_anchor.py @@ -199,7 +199,7 @@ def test_verify_chain_fails_after_chain_substitution_via_session_manager(): replacement = AuditChain(real_chain._session_id) # type: ignore[attr-defined] replacement.append("tool_call", call_id="c1", tool_name="evil_tool", policy_decision="allow") - # Graft replacement entries — anchor is still the original root. + # Graft replacement entries: anchor is still the original root. real_chain._entries = replacement._entries # type: ignore[attr-defined] assert real_chain.verify_chain() is False diff --git a/tests/unit/test_call_log.py b/tests/unit/test_call_log.py index a1518a4..93728e0 100644 --- a/tests/unit/test_call_log.py +++ b/tests/unit/test_call_log.py @@ -124,5 +124,5 @@ def test_suspicious_reset_mid_sequence(): assert log.suspicious_sequence(threshold=3) is True log.record(_rec("y")) # breaks run log.record(_rec("x")) - # now x has run of 1 — still overall suspicious because earlier 4-run is in history + # now x has run of 1: still overall suspicious because earlier 4-run is in history assert log.suspicious_sequence(threshold=3) is True diff --git a/tests/unit/test_catalog.py b/tests/unit/test_catalog.py index baa0b80..cddcde8 100644 --- a/tests/unit/test_catalog.py +++ b/tests/unit/test_catalog.py @@ -155,7 +155,7 @@ def test_catalog_hash_changes_when_entry_changes(catalog_file): # ── POLICY-002: tool name must be lowercase ─────────────────────────────────── def test_uppercase_tool_name_is_rejected(catalog_file): - """POLICY-002 — mixed-case tool names must be rejected at load time.""" + """POLICY-002: mixed-case tool names must be rejected at load time.""" entry = dict(ENTRY_1) entry["tool_name"] = "CRM.Query" with pytest.raises(ConfigError, match="lowercase"): diff --git a/tests/unit/test_config.py b/tests/unit/test_config.py index 5796998..1c44b07 100644 --- a/tests/unit/test_config.py +++ b/tests/unit/test_config.py @@ -85,7 +85,7 @@ def test_invalid_validity_seconds(config_file): def test_unknown_key_raises(config_file): - """CONF-001 — unknown config keys must fail closed, not silently ignore.""" + """CONF-001: unknown config keys must fail closed, not silently ignore.""" path = config_file("unknown_key: value\n") with pytest.raises(ConfigError, match="unknown_key"): load_config(path) @@ -122,7 +122,7 @@ def test_empty_config_uses_defaults(config_file): def test_default_enforcement_mode_is_enforcing(config_file): - """POLICY-003 — omitting enforcement_mode must default to enforcing, not advisory.""" + """POLICY-003: omitting enforcement_mode must default to enforcing, not advisory.""" path = config_file("") cfg = load_config(path) assert cfg.attestation.enforcement_mode == EnforcementMode.ENFORCING diff --git a/tests/unit/test_kill_switch.py b/tests/unit/test_kill_switch.py index 212eea5..f3a6a05 100644 --- a/tests/unit/test_kill_switch.py +++ b/tests/unit/test_kill_switch.py @@ -128,7 +128,7 @@ def test_unblock_clears_flag_and_events(self) -> None: assert ev.is_blocked(_AGENT_ID) is True ev.unblock(_AGENT_ID) assert ev.is_blocked(_AGENT_ID) is False - # Events cleared — below min_calls after unblock + # Events cleared: below min_calls after unblock assert ev.evaluate(_AGENT_ID) is False def test_separate_agent_ids_are_independent(self) -> None: @@ -179,7 +179,7 @@ def test_close_session_below_threshold_not_triggered(self) -> None: ) mgr = SessionManager(ctx) state, chain = mgr.create_session() - # 3 allows, 2 denies = 40% deny rate — below 90% + # 3 allows, 2 denies = 40% deny rate: below 90% for i in range(3): chain.append("tool_call", call_id=f"a{i}", tool_name="t", policy_decision="allow") for i in range(2): diff --git a/tests/unit/test_policy_bundle.py b/tests/unit/test_policy_bundle.py index db33d67..c700afe 100644 --- a/tests/unit/test_policy_bundle.py +++ b/tests/unit/test_policy_bundle.py @@ -210,7 +210,7 @@ def test_policy_store_keeps_current_on_reload_failure(): # ── POLICY-007: agent_os_version pinning ───────────────────────────────────── def test_load_bundle_accepts_manifest_without_agent_os_version(bundle_dir): - """POLICY-007: agent_os_version is optional — bundles without it still load.""" + """POLICY-007: agent_os_version is optional: bundles without it still load.""" bundle = load_policy_bundle(str(bundle_dir)) assert bundle.manifest.agent_os_version is None diff --git a/tests/unit/test_session.py b/tests/unit/test_session.py index 46b8260..1b850d8 100644 --- a/tests/unit/test_session.py +++ b/tests/unit/test_session.py @@ -116,7 +116,7 @@ def test_update_highest_tag_wins_per_update(): # ── AUTH-002: asyncio.Lock guards concurrent mutations ──────────────────────── def test_session_state_has_mutation_lock(): - """AUTH-002 — SessionState must expose an asyncio.Lock for concurrent-mutation protection.""" + """AUTH-002: SessionState must expose an asyncio.Lock for concurrent-mutation protection.""" import asyncio state = SessionState(session_id="s-lock") assert isinstance(state.mutation_lock, asyncio.Lock) @@ -124,7 +124,7 @@ def test_session_state_has_mutation_lock(): @pytest.mark.asyncio async def test_concurrent_update_and_reset_do_not_corrupt_state(): - """AUTH-002 — concurrent update_from_inspection and reset must not leave state inconsistent.""" + """AUTH-002: concurrent update_from_inspection and reset must not leave state inconsistent.""" state = SessionState(session_id="s-concurrent") async def _update(): diff --git a/tests/unit/test_session_call_log.py b/tests/unit/test_session_call_log.py index 70981a3..724ed86 100644 --- a/tests/unit/test_session_call_log.py +++ b/tests/unit/test_session_call_log.py @@ -109,7 +109,7 @@ def test_record_call_empty_response_sensitivity_tags_by_default(): # --------------------------------------------------------------------------- -# get_call_graph_summary — compliance_domains_touched +# get_call_graph_summary: compliance_domains_touched # --------------------------------------------------------------------------- @@ -145,7 +145,7 @@ def test_summary_domains_deduplicated(): # --------------------------------------------------------------------------- -# get_call_graph_summary — cross_boundary_events +# get_call_graph_summary: cross_boundary_events # --------------------------------------------------------------------------- diff --git a/tests/unit/test_session_manager.py b/tests/unit/test_session_manager.py index 9c008ba..08480e0 100644 --- a/tests/unit/test_session_manager.py +++ b/tests/unit/test_session_manager.py @@ -26,7 +26,7 @@ def _make_attestation_report(*, stale: bool = False) -> MagicMock: report.measurement = "DEVELOPMENT_ONLY_NOT_FOR_PRODUCTION" report.report_data = "aa" * 32 report.raw_evidence = None - report.measurement_note = "software-only mode — not hardware-backed" + report.measurement_note = "software-only mode: not hardware-backed" report.attestation_validity_seconds = 86400 if stale: # Set generated_at far in the past so the report is expired. diff --git a/tests/unit/test_soak.py b/tests/unit/test_soak.py index 8ca594c..86d3c0c 100644 --- a/tests/unit/test_soak.py +++ b/tests/unit/test_soak.py @@ -90,7 +90,7 @@ def test_soak_state_initial(): def test_soak_memory_growth_check_bounded(): from tests.soak.run_soak import SoakState, _check_memory_growth state = SoakState() - state.memory_samples = [1000, 2000] # 2x growth — just within threshold + state.memory_samples = [1000, 2000] # 2x growth: just within threshold state.total_calls = 1 # Should pass: tn=2000, threshold=(2*1000) + (1*512) = 2512 assert _check_memory_growth(state) is True @@ -136,7 +136,7 @@ def test_check_signing_key_detects_change(): state = SoakState() _check_signing_key(state, chain1) _check_signing_key(state, chain2) # different chain root - # Different sessions produce different chain roots — simulates key change + # Different sessions produce different chain roots: simulates key change if chain1.chain_root[:16] != chain2.chain_root[:16]: assert state.signing_key_stable is False assert len(state.signing_key_restart_timestamps) == 1 diff --git a/tests/unit/test_startup.py b/tests/unit/test_startup.py index f05469c..e817f1a 100644 --- a/tests/unit/test_startup.py +++ b/tests/unit/test_startup.py @@ -195,7 +195,7 @@ def test_startup_fails_on_missing_config(tmp_path): def test_startup_fails_on_no_tee_no_dev_mode(tmp_path): - """Conformance: ATTEST-001 — no hardware TEE + no dev mode → exit 1.""" + """Conformance: ATTEST-001: no hardware TEE + no dev mode → exit 1.""" config_path = tmp_path / "cmcp-config.yaml" policy_dir = tmp_path / "policy" policy_dir.mkdir() diff --git a/tests/unit/test_tee.py b/tests/unit/test_tee.py index 3c34371..6071a69 100644 --- a/tests/unit/test_tee.py +++ b/tests/unit/test_tee.py @@ -104,7 +104,7 @@ def test_detect_raises_when_no_hardware_and_no_dev_mode(no_dev_config): def test_detect_env_var_alone_does_not_bypass_config(no_dev_config, monkeypatch): - """CONF-002 — CMCP_DEV_MODE in env after config load must not enable software-only.""" + """CONF-002: CMCP_DEV_MODE in env after config load must not enable software-only.""" monkeypatch.setenv("CMCP_DEV_MODE", "1") with patch("cmcp_runtime.tee.detect._get_provider_impl", return_value=None), \ pytest.raises(AttestationProviderUnsupported): diff --git a/tests/unit/test_tee_dev_mode_freeze.py b/tests/unit/test_tee_dev_mode_freeze.py index 87f700f..2851bdb 100644 --- a/tests/unit/test_tee_dev_mode_freeze.py +++ b/tests/unit/test_tee_dev_mode_freeze.py @@ -45,7 +45,7 @@ def test_dev_mode_constant_not_changed_by_later_env_mutation(monkeypatch): mod = _reload_config_with_env(monkeypatch, "0") assert mod.DEV_MODE is False - # Now set the env var — simulates an attacker injecting it at runtime. + # Now set the env var: simulates an attacker injecting it at runtime. monkeypatch.setenv("CMCP_DEV_MODE", "1") # The constant on the already-imported module must remain False. @@ -58,7 +58,7 @@ def test_dev_mode_constant_not_cleared_by_later_env_removal(monkeypatch): mod = _reload_config_with_env(monkeypatch, "1") assert mod.DEV_MODE is True - # Remove the env var — the constant must stay True. + # Remove the env var: the constant must stay True. monkeypatch.delenv("CMCP_DEV_MODE", raising=False) assert mod.DEV_MODE is True