Proposal: Data attribution and compensation layer for AAIF

## Missing AAIF layer: Data attribution and compensation for AI training

AAIF's founding projects cover agent communication (MCP), project-specific instructions (AGENTS.md), and agent frameworks (Goose). What's missing: a protocol for tracking which data trained which model, enforcing data access terms, and computing fair compensation.

**Why this matters now:**
- **EU AI Act Article 10** (August 2026): requires documented data provenance AND governance for high-risk AI
- **Anthropic's $1.5B settlement**: created legal precedent for data compensation
- **Every AAIF member** (OpenAI, Anthropic, Google, Microsoft, AWS) trains on data they need to attribute

**What exists in the ecosystem:**
- **Data Provenance Initiative** (MIT, published in Nature): audits and documents training data lineage
- **OpenLineage** (LF AI & Data, 2368★): tracks pipeline-level transformations
- **C2PA** (Adobe/Microsoft/Intel): content provenance for media
- **Spawning**: opt-out/opt-in consent layer

**What none of them have:** A cryptographic enforcement layer that tracks WHO accessed WHAT data, UNDER WHAT TERMS, enforces compliance at runtime, generates signed receipts, and computes compensation — all with Merkle proofs for settlement.

**We've built this.** The Agent Passport System includes 5 data governance modules (shipped, tested, open source):

1. **Data Source Registration** — rights holders register sources with terms (compensation rates, allowedPurposes, derivative policies). Ed25519-signed, immutable receipts.
2. **Data Enforcement Gate** — runtime access control (enforce/audit/off modes). Blocks non-compliant access.
3. **Data Contribution Ledger** — per-source, per-agent tracking with source metrics.
4. **Training Attribution** — multi-hop derivation chains from source → training, fractional contribution weights.
5. **Merkle-committed Settlement** — signed period records. GDPR Article 30 compliance reports auto-generated.

**120+ dedicated tests** covering adversarial scenarios (terms expiry, agent exclusion, access count exceeded, terms change revocation, derivation chain depth limits).

**How this fits AAIF:** MCP handles how agents talk to tools. AGENTS.md handles how agents understand projects. The data governance layer handles how agents access, attribute, and compensate for data. It's the missing vertical in the foundation's stack.

SDK: `npm install agent-passport-system` (v1.21.4, 1183 tests, Apache-2.0). MCP server: `npx agent-passport-system-mcp` (83 tools — any MCP client gets full protocol access).

Paper: "Monotonic Narrowing for Agent Authority" — https://doi.org/10.5281/zenodo.18749779
Website: https://aeoess.com

Happy to discuss at the TC level or contribute directly to an AAIF data governance workstream.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Data attribution and compensation layer for AAIF #2

Missing AAIF layer: Data attribution and compensation for AI training

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Data attribution and compensation layer for AAIF #2

Description

Missing AAIF layer: Data attribution and compensation for AI training

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions