ClaimBound Evidence

ClaimBound turns a narrow public AI, ML, data or software-development claim into a small evidence card: a checkable record with the protocol, source boundary, hashes, exact result status, claim boundary and reproduction level.

It is not a model leaderboard, hosted scoring service or certification authority. It is an open-source toolkit for asking one plain question:

Where is the evidence?

If there is no evidence card, the statement is still only a claim.

Related Work And Independence

ClaimBound Evidence is an independent open-source evidence-card toolkit for narrow public AI, ML, data and software-development claims. It is not affiliated with EviBound and is not a continuation, fork or implementation of the EviBound paper or any EviBound code/research package.

The maintainer acknowledges EviBound, "Evidence-Bound Autonomous Research (EviBound): A Governance Framework for Eliminating False Claims" (arXiv:2511.05524), submitted on 2025-10-28, as related prior art discovered after ClaimBound development work was already underway. The similar naming and shared concern with evidence-bound claims are acknowledged explicitly.

The projects occupy different scopes: EviBound focuses on autonomous research agent execution gates and MLflow-backed machine-checkable artifacts; ClaimBound Evidence focuses on public evidence cards, source boundaries, claim boundaries, negative/blocked/drift outcomes, registry records and R&D family discipline across public AI, data, reproducibility and software-development workflows.

See Related Work And Independence for the detailed scope split.

Primary Public Workflows

ClaimBound is intentionally narrow. Start with these three foreground workflows:

Workflow	Question it helps answer	Start here
Public AI transparency	Did a public system-card or model-card source boundary pass a frozen audit?	Current evidence tracks
European and public open data	Did an official public-data source pass a source-boundary or narrow empirical gate?	EEA and NASA examples
Software development evidence	Did a narrow build, validator or regression claim pass under fixed commands?	Software development workflow

Additional domains such as robotics, procurement, civic tech and education remain documented as future applications. Blocked cards in those areas are first-class evidence, not hidden failures.

Artifact-only records (for example NYC TLC Phase 4 and CDC mirror summaries) are listed separately in docs/artifacts/README.md and are not presented as completed evidence cards.

Software R&D methodology case study (external, public): docs/case_studies/SOFTWARE_RND_EVIDENCE_CASE_STUDY.md → software-rnd-evidence-case-study.

ClaimBound In 30 Seconds

ClaimBound turns a public statement like "we checked this", "this source exists", "this benchmark reproduced", "this model is better" or "this risky change passed" into a small evidence card: what exactly was checked, under which frozen protocol, against which source, with which status, hashes, limitations and reproduction level.

Its main job is anti-overclaiming. Green means one narrow claim passed under the stated protocol, not "trust everything". Negative, blocked, insufficient and drift results are first-class evidence too, because they stop weak or incomplete claims from being silently upgraded into stronger ones.

Open the standalone document: docs/CLAIMBOUND_IN_30_SECONDS.md.

For Reviewers And External Operators

Start here if you are reviewing the project, trying it from outside the maintainer's machine, or checking what the open-source foreground is meant to deliver:

Reviewer summary gives the problem, strongest cards, public-interest dimension and planned public deliverables in one page.
European Dimension explains the European open-data and digital-commons angle with explicit limits.
Public roadmap 2026 maps the current public work package to concrete software, workflow and documentation outputs.
External operator starter pack explains how to read cards, request a card, rerun an existing card, report source drift or ask a boundary question.

What A Card Shows

An evidence card keeps the useful claim small enough to inspect:

Card field	Plain meaning
Claim	The exact public statement being checked.
Source	The public source or source documentation used for the check.
Protocol	The rules fixed before the result was accepted.
Status	Passed, negative, blocked, insufficient or reproduced.
Boundary	What the card proves and what it must not be used to claim.
Reproduction	Whether another run reproduced the outcome, and with what limits.

Raw payloads, prompt text, transcripts and restricted source files stay outside the public repository unless redistribution is clearly allowed. The public record stores hashes, summaries and links so a local operator or organization can keep private evidence reproducible without publishing sensitive material.

Core Public Workflows

The evidence card is the public unit of work. Start with the smallest workflow that keeps the claim honest.

Workflow	Use it for	Read next
Public AI documentation	Source-audit a public system-card, model-card or policy-document boundary without turning it into a runtime claim.	Current evidence tracks
Public and European open data	Check source access, source drift, official pages and fair result boundaries for public datasets.	Evidence card examples
Reproduction and reruns	Rerun a completed card and report whether the narrow outcome, source bytes and limitations still match.	External operator starter pack
Software and review evidence	Attach a bounded evidence record to a risky change, review gate, CI result or AI-assisted development workflow.	Software development workflow
AI control and agent review	Bind AI-assisted claims and actions to evidence cards, blocked states, human gates, audit trails and tombstones.	12 AI Life Rules

Advanced protocol layers for related work families, frontier ledgers and tree overlays are documented separately. Use the smallest layer stack that prevents overclaiming; do not add layers to make a weak result look stronger.

12 AI Life Rules

One practical use of ClaimBound is a portable evidence-bound control layer for AI claims and actions. The same rules can support internal operator workflows, external review, or public evidence records.

AI may assist, but evidence must bind.

These rules do not make an AI system safe by slogan or intention. They make unsupported claims and risky actions easier to block, review, reproduce or tombstone.

Before an AI-assisted claim or action is trusted, ask:

What exactly is the AI claiming or trying to do?
Which source, file, prompt, dataset, fixture, command or evidence card supports it?
Were the rules and gates fixed before the result?
Is the action reversible, or does it need a human gate?
What risk or unsupported inference is being blocked?
Where is the audit trail or evidence card?

Read the full rulebook: docs/TWELVE_AI_LIFE_CONTROLS.md.

Show the 12 AI Life Rules

Rule	ClaimBound check
Bounded Claim	Keep every AI claim inside a written boundary.
Evidence Before Authority	Do not trust model confidence, brand or tone without evidence.
Frozen Rules Before Run	Fix source, scorer, controls and gates before outcomes.
Source Lineage	Record where data, prompts, files or claims came from.
Honest Blocked State	Block claims when evidence is missing or unusable.
Reversible Action First	Prefer read-only, draft or reversible actions before hard-to-reverse ones.
Human Gate For High-Impact Actions	Require human approval for high-impact or irreversible steps.
No Hidden Goal Substitution	Keep AI work tied to the user's stated intent.
Minimal Necessary Data	Publish summaries and hashes, not raw private payloads.
Counterclaim Search	Look for contradicting, missing and negative evidence.
Audit Trail	Record what happened, why, by which tool and with what result.
Tombstone And Supersession	Preserve failed, stale or overbroad branches instead of hiding them.

Example: AI System-Card Claim

Public claim:

Anthropic publishes a public system-card index for its AI models.

ClaimBound narrows it:

Can the official Anthropic system-card page be source-audited by URL, access
date, content type, expected markers and SHA-256 without making any model
safety, model quality or runtime-behavior claim?

Current card status:

PASSED_UNDER_PROTOCOL / GREEN_VALIDATED

What this proves: the public source boundary passed the documented source-audit gate at access time.

What it does not prove: that Claude or any Anthropic runtime is safer, better, unchanged, deployment-ready or benchmark-superior.

Read the JSON or open the visual SVG card.

Example Cards

These are deliberately different outcomes: green means a narrow claim passed, yellow means reproduction is useful but limited, amber means the source boundary blocked a fair result, and red means the protocol ran but the claim did not pass.

Software development validator gate (in-repo example paired with the external software R&D case study):

Example	Status	What the card proves	Links
Anthropic system-card source audit	`PASSED_UNDER_PROTOCOL`	The official system-card index passed a narrow public-document source audit.	JSON / SVG
EEA download-page source audit	`PASSED_UNDER_PROTOCOL`	The official EEA download page passed a narrow source-audit boundary before any larger data run was claimed.	JSON / SVG
EEA AQ manual track	`BLOCKED_SOURCE`	The larger PM10 manual track could not fairly run from an incomplete public URL manifest.	JSON / SVG
NASA POWER D-103	`PASSED_UNDER_PROTOCOL` with `REPRODUCED_OUTCOME_WITH_SOURCE_BYTE_DRIFT`	The frozen gate-level outcome reproduced, but fresh source bytes differed.	JSON / SVG
NOAA CO-OPS D-131	`NEGATIVE_RESULT_UNDER_PROTOCOL`	The official-source run completed and honestly did not pass the frozen gate.	JSON / SVG
Software dev validator gate D-001	`PASSED_UNDER_PROTOCOL`	The evidence-card validator rejected a card missing `execution_mode` under frozen pytest gate SOFTWARE_DEV_D001 only.	JSON / SVG

Other registry examples include public-interest self-checks and blocked-source records that are deliberately kept out of this first-screen table. For the full card list, see docs/evidence_cards/README.md. The registry index is docs/registry/evidence_index.json.

Start with ClaimBound in 30 seconds, then read ClaimBound in 5 minutes for the plain-language version.

Install

uv sync --extra dev
uv run --extra dev python -m pytest -n auto

Quick Start

Create a draft scaffold:

uv run claimbound new

Create the same scaffold non-interactively:

uv run claimbound new \
  --source-url "https://example.org/source-docs" \
  --protocol-id "EXAMPLE_D001" \
  --domain "public-data" \
  --track-type "source_audit" \
  --execution-mode "MANUAL_NO_AI" \
  --out "docs/manual_audit/EXAMPLE_D001"

Run local demo helpers:

uv run claimbound demo eea-source-audit
uv run claimbound demo grok-source-audit
uv run claimbound validate-all

validate-all checks committed evidence cards, the registry and any optional docs/track_families/*_FAMILY_LEDGER.json, docs/track_families/*_FRONTIER.json or docs/track_families/*_TREE.json files. Historical cards created before the R&D family protocol do not need retroactive ledgers.

Prepare a local-only run root:

uv run claimbound run-root \
  --protocol-id EXAMPLE_D001 \
  --source-url https://example.org/source \
  --operator your-name-or-handle

claimbound new creates a request, protocol draft, playbook, checklist, operator declaration, draft card, R&D family ledger and source-probe summary. It is not evidence. Evidence begins only after an operator freezes the protocol, runs the check, publishes a sanitized report, validates the card and updates the registry.

Next Steps: Simple To Technical

Step	Document	Why read it
1	ClaimBound in 30 seconds	The one-screen explanation.
2	ClaimBound in 5 minutes	The shortest plain-language walkthrough.
3	Evidence card examples	Green, yellow, red and blocked examples in one place.
4	Getting started	Installation, local run roots and scaffold commands.
5	Result status protocol v0.1	Exact statuses and the color semantics used by cards.
6	Evidence card protocol v0.1	Required JSON fields and validation rules.
7	Current evidence tracks	What the committed results prove and do not prove.
8	Project next steps	What is next and what is intentionally out of scope.

Individual pre-registration charters live in docs/protocols/. They are protocol-bound examples, not broad claims.

Deeper Guides

Boundary

This repository is independently usable as an open evidence foreground. It does not include, import or require private background technology.

This is a single-maintainer public repository. Evidence cards are reusable examples and validation records, not a support queue, review service, legal advice, or commitment that the maintainer will run third-party checks on demand.

For public review and sustainability boundaries, see governance, maintainer boundary and release process.

The registry stores validated card metadata and sanitized report references, not raw payloads. Distributed-ledger and chain timestamp features are outside the current roadmap.

For the AI provenance log, use public PRs, commits, releases, checks, evidence cards and registry entries first. GitHub organization audit logs are governance support, not AI provenance by themselves. See AI provenance log and audit logs.

Community

License

Apache-2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
.github		.github
artifacts		artifacts
docs		docs
examples/rerun		examples/rerun
scripts		scripts
specs		specs
src/claimbound_evidence		src/claimbound_evidence
tests		tests
.gitignore		.gitignore
.mailmap		.mailmap
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
GOVERNANCE.md		GOVERNANCE.md
LICENSE		LICENSE
MAINTAINER_BOUNDARY.md		MAINTAINER_BOUNDARY.md
NOTICE		NOTICE
README.md		README.md
RELEASE_PROCESS.md		RELEASE_PROCESS.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClaimBound Evidence

Related Work And Independence

Primary Public Workflows

ClaimBound In 30 Seconds

For Reviewers And External Operators

What A Card Shows

Core Public Workflows

12 AI Life Rules

Example: AI System-Card Claim

Example Cards

Install

Quick Start

Next Steps: Simple To Technical

Deeper Guides

Boundary

Community

License

About

Uh oh!

Releases 10

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ClaimBound Evidence

Related Work And Independence

Primary Public Workflows

ClaimBound In 30 Seconds

For Reviewers And External Operators

What A Card Shows

Core Public Workflows

12 AI Life Rules

Example: AI System-Card Claim

Example Cards

Install

Quick Start

Next Steps: Simple To Technical

Deeper Guides

Boundary

Community

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages