Skip to content

Proposal: Data attribution and compensation layer for AAIF #2

@aeoess

Description

@aeoess

Missing AAIF layer: Data attribution and compensation for AI training

AAIF's founding projects cover agent communication (MCP), project-specific instructions (AGENTS.md), and agent frameworks (Goose). What's missing: a protocol for tracking which data trained which model, enforcing data access terms, and computing fair compensation.

Why this matters now:

  • EU AI Act Article 10 (August 2026): requires documented data provenance AND governance for high-risk AI
  • Anthropic's $1.5B settlement: created legal precedent for data compensation
  • Every AAIF member (OpenAI, Anthropic, Google, Microsoft, AWS) trains on data they need to attribute

What exists in the ecosystem:

  • Data Provenance Initiative (MIT, published in Nature): audits and documents training data lineage
  • OpenLineage (LF AI & Data, 2368★): tracks pipeline-level transformations
  • C2PA (Adobe/Microsoft/Intel): content provenance for media
  • Spawning: opt-out/opt-in consent layer

What none of them have: A cryptographic enforcement layer that tracks WHO accessed WHAT data, UNDER WHAT TERMS, enforces compliance at runtime, generates signed receipts, and computes compensation — all with Merkle proofs for settlement.

We've built this. The Agent Passport System includes 5 data governance modules (shipped, tested, open source):

  1. Data Source Registration — rights holders register sources with terms (compensation rates, allowedPurposes, derivative policies). Ed25519-signed, immutable receipts.
  2. Data Enforcement Gate — runtime access control (enforce/audit/off modes). Blocks non-compliant access.
  3. Data Contribution Ledger — per-source, per-agent tracking with source metrics.
  4. Training Attribution — multi-hop derivation chains from source → training, fractional contribution weights.
  5. Merkle-committed Settlement — signed period records. GDPR Article 30 compliance reports auto-generated.

120+ dedicated tests covering adversarial scenarios (terms expiry, agent exclusion, access count exceeded, terms change revocation, derivation chain depth limits).

How this fits AAIF: MCP handles how agents talk to tools. AGENTS.md handles how agents understand projects. The data governance layer handles how agents access, attribute, and compensate for data. It's the missing vertical in the foundation's stack.

SDK: npm install agent-passport-system (v1.21.4, 1183 tests, Apache-2.0). MCP server: npx agent-passport-system-mcp (83 tools — any MCP client gets full protocol access).

Paper: "Monotonic Narrowing for Agent Authority" — https://doi.org/10.5281/zenodo.18749779
Website: https://aeoess.com

Happy to discuss at the TC level or contribute directly to an AAIF data governance workstream.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions