diff --git a/CHANGELOG.md b/CHANGELOG.md index 448cad7..3069ab7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -18,6 +18,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Peer-call enforcement decision core (Tier 2): `ca2a_runtime.policy.LocalPolicy` and `ca2a_runtime.peer` (`effective_scope`, `enforce_peer_call`). Effective permission is the delegated leaf scope intersected with the callee's local policy; a granted call emits a linked provenance record. New error `SCOPE_NOT_PERMITTED`. Claim C3 (scope-policy intersection) is now a validated experiment. Cedar-engine binding of the local policy and live A2A transport wiring remain open. - Sealed peer channel (Tier 2): `ca2a_runtime.channel` (`SealedChannel`, `generate_channel_keypair`, `open_sealed`). HPKE-style X25519 -> HKDF-SHA256 -> ChaCha20-Poly1305 sealing a payload to the peer's attested key; only the peer's private key opens it, and a wrong key or tampered ciphertext fails closed. Claim C4 (sealed-payload confidentiality) is now a validated experiment at the cryptographic layer. The enclave-binding of the private key (a hardware property) and live-path wiring remain open. - Cross-operator attestation (Claim C6) validated in software: a two-operator harness composing the SEV-SNP verifier, measurement pinning, and the sealed channel demonstrates independent keys, mutual attestation, confidential cross-operator delegation, and binary-swap detection. Synthetic report vectors (a genuine report needs SEV-SNP hardware); real hardware end to end remains open. **All six claims (C1-C6) are now validated experiments.** +- RFC 8785 (JSON Canonicalization Scheme) canonicalization: `ca2a_runtime.canonical.canonicalize`. Credential and provenance bodies are now signed over the JCS encoding (UTF-16 key ordering, JCS string escaping, literal non-ASCII, shortest-decimal integers), so cA2A signatures are cross-verifiable with agent-manifest. ASCII credentials are byte-identical to the previous encoding, so existing signatures still verify. - Repository scaffold: governance, CI/CD, docs framework, and packaging at parity with the agentrust-io house standard ### Not yet implemented diff --git a/ROADMAP.md b/ROADMAP.md index a1e43f3..151c6d2 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -11,7 +11,7 @@ Already implemented and tested elsewhere; cA2A depends on it rather than reimple - Attestation-gated SPIFFE mTLS (cmcp) - Audit chain with external signed evidence references (cmcp) - Cedar policy engine (cmcp) -- Ed25519 + RFC 8785 canonicalization (all three repos) +- Ed25519 + RFC 8785 canonicalization (all three repos; cA2A now ships a JCS canonicalizer in `ca2a_runtime.canonical`) ## v0.1: Profile and offline verifier diff --git a/docs/spec/delegation-chain.md b/docs/spec/delegation-chain.md index 0ca2965..310cf4d 100644 --- a/docs/spec/delegation-chain.md +++ b/docs/spec/delegation-chain.md @@ -18,7 +18,7 @@ A `DelegationCredential` has the following signed body plus a detached signature ## Canonicalization -The signed bytes are a deterministic JSON encoding of the body: sorted keys, compact separators, UTF-8, `scope` as a sorted array. This is the byte string signed and verified. It is a practical subset of RFC 8785 sufficient for the ASCII fields used here; full RFC 8785 alignment with agent-manifest is tracked on the roadmap. +The signed bytes are the RFC 8785 (JSON Canonicalization Scheme) encoding of the body: keys sorted by UTF-16 code units, JCS minimal string escaping, non-ASCII emitted literally as UTF-8, integers in shortest decimal form, `scope` as a sorted array. This is the byte string signed and verified. Using JCS makes cA2A signatures cross-verifiable with agent-manifest and any other conforming implementation. See `ca2a_runtime.canonical`. ## Verification invariants diff --git a/docs/spec/provenance-dag.md b/docs/spec/provenance-dag.md index 3c6a981..7f936b0 100644 --- a/docs/spec/provenance-dag.md +++ b/docs/spec/provenance-dag.md @@ -1,112 +1,112 @@ -# Provenance DAG - -A delegation chain proves who was *allowed* to act. The provenance DAG records what *actually happened*: one signed-in-substance record per delegation hop, each linked to its parent by hash. A chain of records forms a tamper-evident, hash-linked structure that a verifier reconstructs and checks offline, without trusting the operators that produced the records. - -This module (`ca2a_runtime.provenance`) is implemented and validated. Claim C5 exercises all of its properties (see [reproducing the claims](../tutorials/reproducing-the-claims.md) and the experiment at `experiments/claim5-provenance-dag-integrity/`). It is the runtime-evidence side of the [TRACE A2A profile](trace-a2a-profile.md); the full TRACE record binding lands with the Tier 2 provenance work. - -## DelegationRecord - -A `DelegationRecord` is the provenance record a hop emits for the delegation credential it acted under. It is a frozen dataclass: - -| Field | Type | Meaning | -|---|---|---| -| `record_id` | string | Unique id of this record | -| `credential_id` | string | The `credential_id` of the delegation credential this hop acted under | -| `subject` | hex | The delegate for this hop, copied from the credential's `subject` | -| `scope` | set of strings | Capabilities exercised at this hop, copied from the credential's `scope` | -| `parent_record_hash` | string or null | `record_hash()` of the parent record; null at the root | - -The hashed portion of the record is its `body()`: `record_id`, `credential_id`, `subject`, `scope` as a sorted array, and `parent_record_hash`. There is no separate signature field on the record. Integrity comes from the hash link, and authority comes from binding the record back to its signed credential with `cross_check_chain()`. - -## record_hash() - -`record_hash()` is the SHA-256 hex digest over the canonical bytes of `body()`: - -```python -def record_hash(self) -> str: - return hashlib.sha256(canonical_bytes(self.body())).hexdigest() -``` - -`canonical_bytes` is the same deterministic JSON encoding used to sign delegation credentials (sorted keys, compact separators, UTF-8; see [delegation chain](delegation-chain.md)). Because the parent link is the hash of the parent's canonical body, and every other field of a record feeds that hash, any change to a record changes its `record_hash()` and therefore breaks the link in its child. - -## record_for() - -`record_for()` builds the record a hop emits for a given credential. It copies `credential_id`, `subject`, and `scope` straight off the `DelegationCredential`, so a record cannot silently claim a credential id, subject, or scope different from the one it was minted from: - -```python -from ca2a_runtime.delegation.credential import DelegationCredential, new_keypair -from ca2a_runtime.provenance import record_for - -# `cred` is a signed DelegationCredential for this hop. -record = record_for(cred, record_id="rec-0", parent_record_hash=None) -``` - -The caller supplies the `record_id` and the `parent_record_hash` (the previous record's `record_hash()`, or `None` for the root). Chaining a workflow is a fold over the hops: - -```python -from ca2a_runtime.provenance import record_for - -records = [] -parent_hash = None -for i, cred in enumerate(chain): - rec = record_for(cred, record_id=f"rec-{i}", parent_record_hash=parent_hash) - records.append(rec) - parent_hash = rec.record_hash() -``` - -## verify_dag() - -`verify_dag()` takes a root-to-leaf list of records, checks the linking invariants, and returns the list unchanged on success. It raises `ProvenanceLinkBroken` (error code `PROVENANCE_LINK_BROKEN`, HTTP 409) on the first violation: - -```python -from ca2a_runtime.provenance import verify_dag -from ca2a_runtime.errors import ProvenanceLinkBroken - -try: - verify_dag(records) -except ProvenanceLinkBroken as exc: - ... # reject the workflow -``` - -The invariants are: - -| Invariant | Violation | -|---|---| -| The list is non-empty | `PROVENANCE_LINK_BROKEN` (`empty provenance chain`) | -| The first record is a root: `parent_record_hash` is `None` | `PROVENANCE_LINK_BROKEN` (`root record must not reference a parent`) | -| Every later record's `parent_record_hash` equals the recomputed `record_hash()` of the immediately preceding record | `PROVENANCE_LINK_BROKEN` (`record i parent link does not match the previous record's hash`) | -| No `record_id` repeats | `PROVENANCE_LINK_BROKEN` (`duplicate record_id at position i`) | - -The parent hash is *recomputed* from the previous record on every check rather than trusted. That is what makes the structure tamper-evident: the stored link and the recomputed hash have to agree. - -## cross_check_chain() - -`verify_dag()` proves the records are internally consistent, but on its own it says nothing about *authority*. `cross_check_chain()` ties provenance to the verified [delegation chain](delegation-chain.md): record `i` must reference credential `i` and carry the same subject. - -```python -from ca2a_runtime.provenance import cross_check_chain - -# `chain` has passed verify_chain; `records` has passed verify_dag. -cross_check_chain(records, chain) # raises ProvenanceLinkBroken on any mismatch -``` - -It raises `ProvenanceLinkBroken` if the two lists differ in length, if any `record.credential_id` does not equal the corresponding `credential.credential_id`, or if any `record.subject` does not equal the corresponding `credential.subject`. Run `verify_chain` on the credentials and `verify_dag` on the records first, then `cross_check_chain` to bind the two. A forged `credential_id` on a record, for example, is caught here even though the record chain itself hashes cleanly. - -## Tamper-evidence - -SHA-256 exhibits the avalanche property: a one-field change to a record produces a digest that differs from the original in roughly half of its 256 bits. The C5 experiment measures this directly by adding a capability to a record's scope and counting differing bits between the old and new `record_hash()`. It observes about 128 of 256 bits flipped. - -Because the child record stores the parent's *old* hash in `parent_record_hash`, a tampered record no longer hashes to that stored value, and `verify_dag()` raises `ProvenanceLinkBroken` at the child. An attacker cannot fix this by editing only one record: repairing the child's `parent_record_hash` to match the tampered parent changes the child's own hash, which breaks *its* child, and so on to the leaf. Correcting the whole tail requires recomputing every downstream link, which `cross_check_chain()` then rejects if the underlying credentials no longer line up. - -## Reparenting - -Reparenting is the attack of pointing a record at a different, legitimately-hashed parent to hide a hop or re-order the chain. It is caught by the same recompute-and-compare check. `verify_dag()` walks the list in order and requires each record's `parent_record_hash` to equal the hash of the record *immediately before it in the list*. A record whose `parent_record_hash` points at some other record's hash (for example the root's, skipping an intermediate hop) fails that equality and raises `ProvenanceLinkBroken`. The C5 experiment demonstrates this by repointing the leaf's `parent_record_hash` at the root's hash instead of its true parent and confirming detection. - -## Status and scope - -The provenance DAG in this module is implemented and reproducible under claim C5. What it does *not* yet do: - -- The records are hash-linked, not independently signed. Authority binding is via `cross_check_chain()` against signed credentials, not a signature on each record. -- The full TRACE record binding described in the [TRACE A2A profile](trace-a2a-profile.md), emitting these links as `delegation.parent_record_hash` / `delegation.credential_id` fields inside a TRACE record, lands with the Tier 2 provenance work. See [ROADMAP.md](../../ROADMAP.md) and [LIMITATIONS.md](../../LIMITATIONS.md). - -For the offline chain verifier that this module pairs with, see the [verification library](verification-library.md). +# Provenance DAG + +A delegation chain proves who was *allowed* to act. The provenance DAG records what *actually happened*: one signed-in-substance record per delegation hop, each linked to its parent by hash. A chain of records forms a tamper-evident, hash-linked structure that a verifier reconstructs and checks offline, without trusting the operators that produced the records. + +This module (`ca2a_runtime.provenance`) is implemented and validated. Claim C5 exercises all of its properties (see [reproducing the claims](../tutorials/reproducing-the-claims.md) and the experiment at `experiments/claim5-provenance-dag-integrity/`). It is the runtime-evidence side of the [TRACE A2A profile](trace-a2a-profile.md); the full TRACE record binding lands with the Tier 2 provenance work. + +## DelegationRecord + +A `DelegationRecord` is the provenance record a hop emits for the delegation credential it acted under. It is a frozen dataclass: + +| Field | Type | Meaning | +|---|---|---| +| `record_id` | string | Unique id of this record | +| `credential_id` | string | The `credential_id` of the delegation credential this hop acted under | +| `subject` | hex | The delegate for this hop, copied from the credential's `subject` | +| `scope` | set of strings | Capabilities exercised at this hop, copied from the credential's `scope` | +| `parent_record_hash` | string or null | `record_hash()` of the parent record; null at the root | + +The hashed portion of the record is its `body()`: `record_id`, `credential_id`, `subject`, `scope` as a sorted array, and `parent_record_hash`. There is no separate signature field on the record. Integrity comes from the hash link, and authority comes from binding the record back to its signed credential with `cross_check_chain()`. + +## record_hash() + +`record_hash()` is the SHA-256 hex digest over the canonical bytes of `body()`: + +```python +def record_hash(self) -> str: + return hashlib.sha256(canonical_bytes(self.body())).hexdigest() +``` + +`canonical_bytes` is the same RFC 8785 (JCS) encoding used to sign delegation credentials (see [delegation chain](delegation-chain.md)). Because the parent link is the hash of the parent's canonical body, and every other field of a record feeds that hash, any change to a record changes its `record_hash()` and therefore breaks the link in its child. + +## record_for() + +`record_for()` builds the record a hop emits for a given credential. It copies `credential_id`, `subject`, and `scope` straight off the `DelegationCredential`, so a record cannot silently claim a credential id, subject, or scope different from the one it was minted from: + +```python +from ca2a_runtime.delegation.credential import DelegationCredential, new_keypair +from ca2a_runtime.provenance import record_for + +# `cred` is a signed DelegationCredential for this hop. +record = record_for(cred, record_id="rec-0", parent_record_hash=None) +``` + +The caller supplies the `record_id` and the `parent_record_hash` (the previous record's `record_hash()`, or `None` for the root). Chaining a workflow is a fold over the hops: + +```python +from ca2a_runtime.provenance import record_for + +records = [] +parent_hash = None +for i, cred in enumerate(chain): + rec = record_for(cred, record_id=f"rec-{i}", parent_record_hash=parent_hash) + records.append(rec) + parent_hash = rec.record_hash() +``` + +## verify_dag() + +`verify_dag()` takes a root-to-leaf list of records, checks the linking invariants, and returns the list unchanged on success. It raises `ProvenanceLinkBroken` (error code `PROVENANCE_LINK_BROKEN`, HTTP 409) on the first violation: + +```python +from ca2a_runtime.provenance import verify_dag +from ca2a_runtime.errors import ProvenanceLinkBroken + +try: + verify_dag(records) +except ProvenanceLinkBroken as exc: + ... # reject the workflow +``` + +The invariants are: + +| Invariant | Violation | +|---|---| +| The list is non-empty | `PROVENANCE_LINK_BROKEN` (`empty provenance chain`) | +| The first record is a root: `parent_record_hash` is `None` | `PROVENANCE_LINK_BROKEN` (`root record must not reference a parent`) | +| Every later record's `parent_record_hash` equals the recomputed `record_hash()` of the immediately preceding record | `PROVENANCE_LINK_BROKEN` (`record i parent link does not match the previous record's hash`) | +| No `record_id` repeats | `PROVENANCE_LINK_BROKEN` (`duplicate record_id at position i`) | + +The parent hash is *recomputed* from the previous record on every check rather than trusted. That is what makes the structure tamper-evident: the stored link and the recomputed hash have to agree. + +## cross_check_chain() + +`verify_dag()` proves the records are internally consistent, but on its own it says nothing about *authority*. `cross_check_chain()` ties provenance to the verified [delegation chain](delegation-chain.md): record `i` must reference credential `i` and carry the same subject. + +```python +from ca2a_runtime.provenance import cross_check_chain + +# `chain` has passed verify_chain; `records` has passed verify_dag. +cross_check_chain(records, chain) # raises ProvenanceLinkBroken on any mismatch +``` + +It raises `ProvenanceLinkBroken` if the two lists differ in length, if any `record.credential_id` does not equal the corresponding `credential.credential_id`, or if any `record.subject` does not equal the corresponding `credential.subject`. Run `verify_chain` on the credentials and `verify_dag` on the records first, then `cross_check_chain` to bind the two. A forged `credential_id` on a record, for example, is caught here even though the record chain itself hashes cleanly. + +## Tamper-evidence + +SHA-256 exhibits the avalanche property: a one-field change to a record produces a digest that differs from the original in roughly half of its 256 bits. The C5 experiment measures this directly by adding a capability to a record's scope and counting differing bits between the old and new `record_hash()`. It observes about 128 of 256 bits flipped. + +Because the child record stores the parent's *old* hash in `parent_record_hash`, a tampered record no longer hashes to that stored value, and `verify_dag()` raises `ProvenanceLinkBroken` at the child. An attacker cannot fix this by editing only one record: repairing the child's `parent_record_hash` to match the tampered parent changes the child's own hash, which breaks *its* child, and so on to the leaf. Correcting the whole tail requires recomputing every downstream link, which `cross_check_chain()` then rejects if the underlying credentials no longer line up. + +## Reparenting + +Reparenting is the attack of pointing a record at a different, legitimately-hashed parent to hide a hop or re-order the chain. It is caught by the same recompute-and-compare check. `verify_dag()` walks the list in order and requires each record's `parent_record_hash` to equal the hash of the record *immediately before it in the list*. A record whose `parent_record_hash` points at some other record's hash (for example the root's, skipping an intermediate hop) fails that equality and raises `ProvenanceLinkBroken`. The C5 experiment demonstrates this by repointing the leaf's `parent_record_hash` at the root's hash instead of its true parent and confirming detection. + +## Status and scope + +The provenance DAG in this module is implemented and reproducible under claim C5. What it does *not* yet do: + +- The records are hash-linked, not independently signed. Authority binding is via `cross_check_chain()` against signed credentials, not a signature on each record. +- The full TRACE record binding described in the [TRACE A2A profile](trace-a2a-profile.md), emitting these links as `delegation.parent_record_hash` / `delegation.credential_id` fields inside a TRACE record, lands with the Tier 2 provenance work. See [ROADMAP.md](../../ROADMAP.md) and [LIMITATIONS.md](../../LIMITATIONS.md). + +For the offline chain verifier that this module pairs with, see the [verification library](verification-library.md). diff --git a/docs/tutorials/emit-and-verify-provenance.md b/docs/tutorials/emit-and-verify-provenance.md index 2b14f2a..d998dd8 100644 --- a/docs/tutorials/emit-and-verify-provenance.md +++ b/docs/tutorials/emit-and-verify-provenance.md @@ -1,203 +1,203 @@ -# Emit and Verify Provenance - -A verified delegation chain tells you who was allowed to act. A provenance DAG is the runtime evidence that the delegation actually happened, in order, and was not edited after the fact. This tutorial takes a signed chain, emits one `DelegationRecord` per hop, verifies the linked records offline, tampers with one record to watch the link break, and binds the provenance back to the delegation credentials it claims to act under. - -Everything here runs offline with no hardware. It mirrors `experiments/claim5-provenance-dag-integrity`. For the model behind these records, see [provenance-dag.md](../spec/provenance-dag.md); for the credential model see [delegation-chain.md](../spec/delegation-chain.md). - -## What a record is - -Each delegation hop emits a `DelegationRecord`. The record names the credential it acted under, repeats that credential's `subject` and `scope`, and carries `parent_record_hash`: the SHA-256 of the previous record's canonical body. The hash link is what makes the DAG tamper-evident. Change any field of a record and its `record_hash()` changes, so the child that pointed at the old hash no longer lines up. - -```python -from dataclasses import dataclass - -@dataclass(frozen=True) -class DelegationRecord: - record_id: str - credential_id: str - subject: str - scope: frozenset[str] - parent_record_hash: str | None = None -``` - -`record_hash()` is SHA-256 over the canonical body (`record_id`, `credential_id`, `subject`, sorted `scope`, `parent_record_hash`). The canonicalization is the same deterministic JSON encoding used to sign credentials, so an auditor recomputes the exact bytes. - -## 1. Build a signed chain - -Start from a correctly signed root-to-leaf delegation chain. This is the same setup used in [verify-a-delegation-chain.md](verify-a-delegation-chain.md); here we build it in code so we have the credentials in hand to emit records from. - -```python -from ca2a_runtime.delegation.credential import DelegationCredential, new_keypair - - -def build_chain(scopes: list[frozenset[str]]) -> list[DelegationCredential]: - chain: list[DelegationCredential] = [] - priv, pub = new_keypair() - parent_id: str | None = None - for depth, scope in enumerate(scopes): - next_priv, next_pub = new_keypair() - cred = DelegationCredential( - credential_id=f"cred-{depth}", - issuer=pub, - subject=next_pub, - scope=scope, - depth=depth, - parent_id=parent_id, - ).sign(priv) - chain.append(cred) - parent_id = cred.credential_id - priv, pub = next_priv, next_pub - return chain - - -chain = build_chain( - [ - frozenset({"cap:a", "cap:b", "cap:c"}), - frozenset({"cap:a", "cap:b"}), - frozenset({"cap:a"}), - ] -) -``` - -Each hop's `issuer` is the previous hop's `subject`, and scope narrows at every step. `new_keypair()` returns an `Ed25519PrivateKey` and its public key as raw hex. - -## 2. Emit one record per hop - -Walk the chain and call `record_for()` for each credential, threading the running `parent_record_hash`. The root record has no parent, so it starts at `None`. - -```python -from ca2a_runtime.provenance import DelegationRecord, record_for - - -def records_from_chain(chain: list[DelegationCredential]) -> list[DelegationRecord]: - records: list[DelegationRecord] = [] - parent_hash: str | None = None - for i, cred in enumerate(chain): - rec = record_for(cred, record_id=f"rec-{i}", parent_record_hash=parent_hash) - records.append(rec) - parent_hash = rec.record_hash() - return records - - -records = records_from_chain(chain) -``` - -`record_for(credential, record_id, parent_record_hash)` copies `credential_id`, `subject`, and `scope` off the credential and stamps in the parent link you pass. After appending a record you recompute `record_hash()` and feed it forward as the next record's parent. - -## 3. Verify the DAG - -`verify_dag()` walks the records root to leaf and returns them in order on success. It enforces three things: the first record must be a root (no parent link), every later record's `parent_record_hash` must equal the recomputed hash of the immediately preceding record, and no `record_id` may repeat. - -```python -from ca2a_runtime.provenance import verify_dag - -verified = verify_dag(records) -print(f"verified {len(verified)} records") -# verified 3 records -``` - -If it returns without raising, the linked hash chain is intact. - -## 4. Tamper with a record - -Now edit one field of a record without touching anything else. Because `record_hash()` covers `scope`, adding a capability flips roughly half of the 256 hash bits (the SHA-256 avalanche), so record 1's new hash no longer matches the `parent_record_hash` that record 2 still stores. - -```python -from ca2a_runtime.errors import ProvenanceLinkBroken - -original = records[1] -tampered = DelegationRecord( - record_id=original.record_id, - credential_id=original.credential_id, - subject=original.subject, - scope=frozenset(original.scope | {"cap:injected"}), - parent_record_hash=original.parent_record_hash, -) - -tampered_records = list(records) -tampered_records[1] = tampered - -try: - verify_dag(tampered_records) -except ProvenanceLinkBroken as exc: - print(f"{exc.code}: {exc}") -# PROVENANCE_LINK_BROKEN: record 2 parent link does not match the previous record's hash -``` - -`ProvenanceLinkBroken` carries code `PROVENANCE_LINK_BROKEN` and HTTP status 409. The message names the position where the link failed, and `exc.detail` reads `a tampered or reparented record was detected`. Note that the tampered record is at position 1, but the break is detected at position 2: the verifier catches the edit at the first child whose stored link no longer matches. - -## 5. Reparent a record - -The same mechanism catches a record repointed at a different parent, even when the record's own fields are untouched. Here the leaf is made to claim the root's hash as its parent instead of record 1's. - -```python -leaf = records[2] -reparented = DelegationRecord( - record_id=leaf.record_id, - credential_id=leaf.credential_id, - subject=leaf.subject, - scope=leaf.scope, - parent_record_hash=records[0].record_hash(), # should be records[1]'s hash -) - -reparented_records = list(records) -reparented_records[2] = reparented - -try: - verify_dag(reparented_records) -except ProvenanceLinkBroken as exc: - print(f"{exc.code}: {exc}") -# PROVENANCE_LINK_BROKEN: record 2 parent link does not match the previous record's hash -``` - -You cannot splice a record into a different position in the DAG without breaking the link, because the stored `parent_record_hash` must equal the hash of the record that actually precedes it. - -## 6. Bind provenance to authority - -`verify_dag()` proves the records are internally consistent, but on its own it does not prove they describe the delegation you think they do. Records could be internally valid yet name credentials that never existed. `cross_check_chain()` closes that gap: record `i` must reference credential `i` and carry the same subject. - -```python -from ca2a_runtime.provenance import cross_check_chain - -cross_check_chain(records, chain) # returns None on success -print("provenance bound to the delegation chain") -``` - -Forge a `credential_id` on any record and the cross-check rejects it: - -```python -mismatch = list(records) -mismatch[0] = DelegationRecord( - record_id=records[0].record_id, - credential_id="FORGED-CRED-ID", - subject=records[0].subject, - scope=records[0].scope, - parent_record_hash=None, -) - -try: - cross_check_chain(mismatch, chain) -except ProvenanceLinkBroken as exc: - print(f"{exc.code}: {exc}") -# PROVENANCE_LINK_BROKEN: record 0 credential_id does not match the chain -``` - -`cross_check_chain()` also raises `ProvenanceLinkBroken` if the record list and the chain are different lengths, or if any record's `subject` does not match its credential's `subject`. Run both checks together and a valid provenance DAG cannot be fabricated independently of the signed authority it claims. - -## What you proved - -You emitted a linked provenance record per delegation hop, verified the DAG offline, and watched a single-field edit and a reparent both surface as `ProvenanceLinkBroken`. `cross_check_chain()` ties every record back to the credential it acted under, so the evidence trail is bound to the signed delegation chain, not free-floating. An auditor replays `verify_dag()` and `cross_check_chain()` against the recorded records and credentials without trusting the runtime that emitted them. - -## Scope and limits - -This is the runtime-evidence side of the cA2A profile and it works today. What it is not: - -- These records are a hash-linked evidence trail. They are not yet the full TRACE binding; that lands with the Tier 2 provenance work. See [trace-a2a-profile.md](../spec/trace-a2a-profile.md). -- The DAG is verified after the fact from recorded records. cA2A does not yet enforce peer behavior at runtime (Tier 2), so a peer must still emit honest records for the trail to mean anything. See [threat-model.md](../spec/threat-model.md) and [LIMITATIONS.md](../../LIMITATIONS.md). -- The verifier walks a single root-to-leaf sequence. Branching DAGs and the wire format for transmitting records are on the [roadmap](../../ROADMAP.md). - -## Next steps - -- Reproduce the numbers behind this page: [reproducing-the-claims.md](reproducing-the-claims.md). -- Author the credentials the records point at: [authoring-a-delegation-credential.md](authoring-a-delegation-credential.md). -- The full provenance model and record schema: [provenance-dag.md](../spec/provenance-dag.md). +# Emit and Verify Provenance + +A verified delegation chain tells you who was allowed to act. A provenance DAG is the runtime evidence that the delegation actually happened, in order, and was not edited after the fact. This tutorial takes a signed chain, emits one `DelegationRecord` per hop, verifies the linked records offline, tampers with one record to watch the link break, and binds the provenance back to the delegation credentials it claims to act under. + +Everything here runs offline with no hardware. It mirrors `experiments/claim5-provenance-dag-integrity`. For the model behind these records, see [provenance-dag.md](../spec/provenance-dag.md); for the credential model see [delegation-chain.md](../spec/delegation-chain.md). + +## What a record is + +Each delegation hop emits a `DelegationRecord`. The record names the credential it acted under, repeats that credential's `subject` and `scope`, and carries `parent_record_hash`: the SHA-256 of the previous record's canonical body. The hash link is what makes the DAG tamper-evident. Change any field of a record and its `record_hash()` changes, so the child that pointed at the old hash no longer lines up. + +```python +from dataclasses import dataclass + +@dataclass(frozen=True) +class DelegationRecord: + record_id: str + credential_id: str + subject: str + scope: frozenset[str] + parent_record_hash: str | None = None +``` + +`record_hash()` is SHA-256 over the canonical body (`record_id`, `credential_id`, `subject`, sorted `scope`, `parent_record_hash`). The canonicalization is the same RFC 8785 (JCS) encoding used to sign credentials, so an auditor recomputes the exact bytes. + +## 1. Build a signed chain + +Start from a correctly signed root-to-leaf delegation chain. This is the same setup used in [verify-a-delegation-chain.md](verify-a-delegation-chain.md); here we build it in code so we have the credentials in hand to emit records from. + +```python +from ca2a_runtime.delegation.credential import DelegationCredential, new_keypair + + +def build_chain(scopes: list[frozenset[str]]) -> list[DelegationCredential]: + chain: list[DelegationCredential] = [] + priv, pub = new_keypair() + parent_id: str | None = None + for depth, scope in enumerate(scopes): + next_priv, next_pub = new_keypair() + cred = DelegationCredential( + credential_id=f"cred-{depth}", + issuer=pub, + subject=next_pub, + scope=scope, + depth=depth, + parent_id=parent_id, + ).sign(priv) + chain.append(cred) + parent_id = cred.credential_id + priv, pub = next_priv, next_pub + return chain + + +chain = build_chain( + [ + frozenset({"cap:a", "cap:b", "cap:c"}), + frozenset({"cap:a", "cap:b"}), + frozenset({"cap:a"}), + ] +) +``` + +Each hop's `issuer` is the previous hop's `subject`, and scope narrows at every step. `new_keypair()` returns an `Ed25519PrivateKey` and its public key as raw hex. + +## 2. Emit one record per hop + +Walk the chain and call `record_for()` for each credential, threading the running `parent_record_hash`. The root record has no parent, so it starts at `None`. + +```python +from ca2a_runtime.provenance import DelegationRecord, record_for + + +def records_from_chain(chain: list[DelegationCredential]) -> list[DelegationRecord]: + records: list[DelegationRecord] = [] + parent_hash: str | None = None + for i, cred in enumerate(chain): + rec = record_for(cred, record_id=f"rec-{i}", parent_record_hash=parent_hash) + records.append(rec) + parent_hash = rec.record_hash() + return records + + +records = records_from_chain(chain) +``` + +`record_for(credential, record_id, parent_record_hash)` copies `credential_id`, `subject`, and `scope` off the credential and stamps in the parent link you pass. After appending a record you recompute `record_hash()` and feed it forward as the next record's parent. + +## 3. Verify the DAG + +`verify_dag()` walks the records root to leaf and returns them in order on success. It enforces three things: the first record must be a root (no parent link), every later record's `parent_record_hash` must equal the recomputed hash of the immediately preceding record, and no `record_id` may repeat. + +```python +from ca2a_runtime.provenance import verify_dag + +verified = verify_dag(records) +print(f"verified {len(verified)} records") +# verified 3 records +``` + +If it returns without raising, the linked hash chain is intact. + +## 4. Tamper with a record + +Now edit one field of a record without touching anything else. Because `record_hash()` covers `scope`, adding a capability flips roughly half of the 256 hash bits (the SHA-256 avalanche), so record 1's new hash no longer matches the `parent_record_hash` that record 2 still stores. + +```python +from ca2a_runtime.errors import ProvenanceLinkBroken + +original = records[1] +tampered = DelegationRecord( + record_id=original.record_id, + credential_id=original.credential_id, + subject=original.subject, + scope=frozenset(original.scope | {"cap:injected"}), + parent_record_hash=original.parent_record_hash, +) + +tampered_records = list(records) +tampered_records[1] = tampered + +try: + verify_dag(tampered_records) +except ProvenanceLinkBroken as exc: + print(f"{exc.code}: {exc}") +# PROVENANCE_LINK_BROKEN: record 2 parent link does not match the previous record's hash +``` + +`ProvenanceLinkBroken` carries code `PROVENANCE_LINK_BROKEN` and HTTP status 409. The message names the position where the link failed, and `exc.detail` reads `a tampered or reparented record was detected`. Note that the tampered record is at position 1, but the break is detected at position 2: the verifier catches the edit at the first child whose stored link no longer matches. + +## 5. Reparent a record + +The same mechanism catches a record repointed at a different parent, even when the record's own fields are untouched. Here the leaf is made to claim the root's hash as its parent instead of record 1's. + +```python +leaf = records[2] +reparented = DelegationRecord( + record_id=leaf.record_id, + credential_id=leaf.credential_id, + subject=leaf.subject, + scope=leaf.scope, + parent_record_hash=records[0].record_hash(), # should be records[1]'s hash +) + +reparented_records = list(records) +reparented_records[2] = reparented + +try: + verify_dag(reparented_records) +except ProvenanceLinkBroken as exc: + print(f"{exc.code}: {exc}") +# PROVENANCE_LINK_BROKEN: record 2 parent link does not match the previous record's hash +``` + +You cannot splice a record into a different position in the DAG without breaking the link, because the stored `parent_record_hash` must equal the hash of the record that actually precedes it. + +## 6. Bind provenance to authority + +`verify_dag()` proves the records are internally consistent, but on its own it does not prove they describe the delegation you think they do. Records could be internally valid yet name credentials that never existed. `cross_check_chain()` closes that gap: record `i` must reference credential `i` and carry the same subject. + +```python +from ca2a_runtime.provenance import cross_check_chain + +cross_check_chain(records, chain) # returns None on success +print("provenance bound to the delegation chain") +``` + +Forge a `credential_id` on any record and the cross-check rejects it: + +```python +mismatch = list(records) +mismatch[0] = DelegationRecord( + record_id=records[0].record_id, + credential_id="FORGED-CRED-ID", + subject=records[0].subject, + scope=records[0].scope, + parent_record_hash=None, +) + +try: + cross_check_chain(mismatch, chain) +except ProvenanceLinkBroken as exc: + print(f"{exc.code}: {exc}") +# PROVENANCE_LINK_BROKEN: record 0 credential_id does not match the chain +``` + +`cross_check_chain()` also raises `ProvenanceLinkBroken` if the record list and the chain are different lengths, or if any record's `subject` does not match its credential's `subject`. Run both checks together and a valid provenance DAG cannot be fabricated independently of the signed authority it claims. + +## What you proved + +You emitted a linked provenance record per delegation hop, verified the DAG offline, and watched a single-field edit and a reparent both surface as `ProvenanceLinkBroken`. `cross_check_chain()` ties every record back to the credential it acted under, so the evidence trail is bound to the signed delegation chain, not free-floating. An auditor replays `verify_dag()` and `cross_check_chain()` against the recorded records and credentials without trusting the runtime that emitted them. + +## Scope and limits + +This is the runtime-evidence side of the cA2A profile and it works today. What it is not: + +- These records are a hash-linked evidence trail. They are not yet the full TRACE binding; that lands with the Tier 2 provenance work. See [trace-a2a-profile.md](../spec/trace-a2a-profile.md). +- The DAG is verified after the fact from recorded records. cA2A does not yet enforce peer behavior at runtime (Tier 2), so a peer must still emit honest records for the trail to mean anything. See [threat-model.md](../spec/threat-model.md) and [LIMITATIONS.md](../../LIMITATIONS.md). +- The verifier walks a single root-to-leaf sequence. Branching DAGs and the wire format for transmitting records are on the [roadmap](../../ROADMAP.md). + +## Next steps + +- Reproduce the numbers behind this page: [reproducing-the-claims.md](reproducing-the-claims.md). +- Author the credentials the records point at: [authoring-a-delegation-credential.md](authoring-a-delegation-credential.md). +- The full provenance model and record schema: [provenance-dag.md](../spec/provenance-dag.md). diff --git a/src/ca2a_runtime/canonical.py b/src/ca2a_runtime/canonical.py new file mode 100644 index 0000000..2408315 --- /dev/null +++ b/src/ca2a_runtime/canonical.py @@ -0,0 +1,69 @@ +"""RFC 8785 JSON Canonicalization Scheme (JCS) for the value types cA2A signs. + +Credentials and provenance records are signed over the canonical byte encoding +of a JSON object. RFC 8785 fixes that encoding so any conforming implementation +(here and in agent-manifest) produces identical bytes and therefore +cross-verifiable signatures. + +This implements JCS for the JSON value types cA2A uses: objects, arrays, +strings, integers, booleans, and null. Object keys are sorted by their UTF-16 +code units, strings use JCS minimal escaping (control characters only; non-ASCII +is emitted literally as UTF-8), and integers serialize as their shortest decimal +form. Floating-point numbers are not part of the cA2A data model and are +rejected rather than serialized approximately. +""" + +from __future__ import annotations + +from typing import Any + +# JCS short escapes for control characters (RFC 8785 section 3.2.2.2). +_SHORT_ESCAPES = { + 0x08: "\\b", + 0x09: "\\t", + 0x0A: "\\n", + 0x0C: "\\f", + 0x0D: "\\r", + 0x22: '\\"', + 0x5C: "\\\\", +} + + +def _escape_string(s: str) -> str: + out: list[str] = ['"'] + for ch in s: + code = ord(ch) + if code in _SHORT_ESCAPES: + out.append(_SHORT_ESCAPES[code]) + elif code < 0x20: + out.append(f"\\u{code:04x}") + else: + out.append(ch) + out.append('"') + return "".join(out) + + +def _serialize(value: Any) -> str: + if value is None: + return "null" + if value is True: + return "true" + if value is False: + return "false" + if isinstance(value, str): + return _escape_string(value) + if isinstance(value, int): # bool already handled above + return str(value) + if isinstance(value, float): + raise TypeError("RFC 8785 canonicalization of floats is not supported in cA2A") + if isinstance(value, list): + return "[" + ",".join(_serialize(v) for v in value) + "]" + if isinstance(value, dict): + items = sorted(value.items(), key=lambda kv: str(kv[0]).encode("utf-16-be")) + return "{" + ",".join(f"{_escape_string(str(k))}:{_serialize(v)}" for k, v in items) + "}" + raise TypeError(f"unsupported type for canonicalization: {type(value).__name__}") + + +def canonicalize(value: Any) -> bytes: + """Return the RFC 8785 canonical UTF-8 encoding of ``value``.""" + return _serialize(value).encode("utf-8") diff --git a/src/ca2a_runtime/delegation/credential.py b/src/ca2a_runtime/delegation/credential.py index c188bb7..aade5bc 100644 --- a/src/ca2a_runtime/delegation/credential.py +++ b/src/ca2a_runtime/delegation/credential.py @@ -11,15 +11,13 @@ 4. Anti-replay: parent_id links to the previous credential_id and every credential_id in the chain is unique. -Canonicalization uses a deterministic JSON encoding (sorted keys, compact -separators, UTF-8). This is the stable byte string signed and verified; it is a -practical subset of RFC 8785 sufficient for the ASCII credential fields used -here. Full RFC 8785 alignment with agent-manifest is tracked on the roadmap. +Canonicalization uses RFC 8785 (JSON Canonicalization Scheme), so the signed +byte string is identical across conforming implementations and cA2A signatures +are cross-verifiable with agent-manifest. See ca2a_runtime.canonical. """ from __future__ import annotations -import json from dataclasses import dataclass from typing import Any @@ -29,6 +27,7 @@ Ed25519PublicKey, ) +from ca2a_runtime.canonical import canonicalize from ca2a_runtime.errors import ( BrokenDelegationLink, CredentialReplay, @@ -46,10 +45,13 @@ def new_keypair() -> tuple[Ed25519PrivateKey, str]: def canonical_bytes(payload: dict[str, Any]) -> bytes: - """Deterministic byte encoding of a credential body (signature excluded).""" - return json.dumps( - payload, sort_keys=True, separators=(",", ":"), ensure_ascii=True - ).encode("utf-8") + """RFC 8785 (JCS) canonical byte encoding of a credential body. + + This is the stable byte string signed and verified; using JCS makes cA2A + signatures cross-verifiable with agent-manifest and any other conforming + implementation. See ca2a_runtime.canonical. + """ + return canonicalize(payload) @dataclass(frozen=True) diff --git a/tests/unit/test_canonical.py b/tests/unit/test_canonical.py new file mode 100644 index 0000000..fd92b62 --- /dev/null +++ b/tests/unit/test_canonical.py @@ -0,0 +1,47 @@ +"""Tests for the RFC 8785 (JCS) canonicalizer.""" + +from __future__ import annotations + +import pytest + +from ca2a_runtime.canonical import canonicalize + + +def test_key_order_is_deterministic() -> None: + assert canonicalize({"b": 1, "a": 2}) == canonicalize({"a": 2, "b": 1}) + assert canonicalize({"b": 1, "a": 2}) == b'{"a":2,"b":1}' + + +def test_primitives() -> None: + assert canonicalize(None) == b"null" + assert canonicalize(True) == b"true" + assert canonicalize(False) == b"false" + assert canonicalize(42) == b"42" + assert canonicalize("x") == b'"x"' + + +def test_nested_structures() -> None: + assert canonicalize({"scope": ["b", "a"], "n": 0}) == b'{"n":0,"scope":["b","a"]}' + + +def test_control_character_escaping() -> None: + # Newline and tab use short escapes; other controls use \\u00xx. + assert canonicalize("a\nb\tc") == b'"a\\nb\\tc"' + assert canonicalize("\x00\x1f") == b'"\\u0000\\u001f"' + assert canonicalize('a"b\\c') == b'"a\\"b\\\\c"' + + +def test_non_ascii_is_literal_utf8() -> None: + # JCS does not escape non-ASCII; it is emitted as UTF-8 bytes. + assert canonicalize({"k": "é"}) == '{"k":"é"}'.encode() + + +def test_utf16_key_ordering() -> None: + # Keys sort by UTF-16 code units; ASCII keys sort as expected. + out = canonicalize({"z": 1, "a": 1, "m": 1}) + assert out == b'{"a":1,"m":1,"z":1}' + + +def test_float_rejected() -> None: + with pytest.raises(TypeError): + canonicalize({"x": 1.5})