Open-source reliability and governance checks for AI agents.
Ten opinionated checks. One checklist per check. One verification pass your agent either passes, partially passes, or fails. No theory, no tooling lock-in, no vendor dependency. Built from running agents in production every day.
→ Full writeups: goclawproof.com/checks → Interactive assessment: goclawproof.com/assessment
| # | Check | Category |
|---|---|---|
| 01 | Tool Permissions & Least Privilege | Security |
| 02 | Logging & Audit Trails | Quality |
| 03 | Prompt Injection & Data Exfiltration | Security |
| 04 | Human-in-the-Loop & Escalation | Governance |
| 05 | Rollback & Kill Switches | Operations |
| 06 | Secrets Management | Security |
| 07 | Evaluation & Regression Testing | Quality |
| 08 | Data Boundaries & RAG Governance | Governance |
| 09 | Cost Controls & Rate Limiting | Operations |
| 10 | Multi-Agent Coordination | Quality |
Each check is a single YAML file with the failure mode, two verification questions, a production checklist, common pitfalls, and a link to the full writeup.
A packaged skill is included at skills/clawproof-audit/. Drop it into any Claude Code or Claude Agent SDK project and an agent can audit a codebase, runbook, or configuration against all 10 checks on demand.
# Clone into your project's skills directory
git clone https://github.com/lexbeam-software/clawproof-checks.git /tmp/clawproof
cp -r /tmp/clawproof/skills/clawproof-audit ./.claude/skills/
# In Claude Code:
> /skills
# Select: clawproof-audit
> audit this agent for production readinessThe skill produces a scored audit report (0-100) with prioritized remediation, linked back to the full check writeups.
Every check is also bundled into a single JSON file for programmatic consumption:
curl -O https://raw.githubusercontent.com/lexbeam-software/clawproof-checks/main/skills/clawproof-audit/checks.jsonSchema: see checks/SCHEMA.md.
Teams also treat the checks as a pre-production gate. Add them to your agent release checklist; fail a launch if any check scores below 7/10. The YAML is stable enough to diff in CI.
Because agent governance that only lives inside one consultancy's deck is not governance, it is slideware. These checks are the inputs I wish I had when I started running agents in production. If they save someone a Monday morning incident, the repo has paid for itself.
New failure modes, additional checklist items, language-specific playbooks — all welcome. See CONTRIBUTING.md.
MIT. Use these checks in commercial products, internal tooling, client deliverables, or anywhere else. Attribution is appreciated but not required.
Maintained by Werner Plutat / Lexbeam Software.
For enterprise rollout support in DACH: agentklar.de.