Verification and debugging framework for AI agents.
Stop agents from acting on stale assumptions, making unsupported claims, or silently switching hypotheses mid-investigation. A workspace of TypeScript packages (evidence ledger, claim gate, hypothesis tracker, runtime reality checker, debug playbook engine, domain router, MCP server) that an agent harness wires into its session and tool-call lifecycle.
Most agent tooling helps a model talk about a problem.
agent-groundingmakes it prove what it has actually checked, what it has only assumed, and what it has ruled out, before the next destructive command runs.
git clone https://github.com/LanNguyenSi/agent-grounding && cd agent-grounding
npm install && npm run build
# Run the demo against a scratch session so it doesn't pollute the default ledger
LEDGER="node packages/evidence-ledger/dist/cli.js"
$LEDGER clear --session readme-demo # no-op on first run
$LEDGER fact "process is not running" \
--source "ps aux | grep clawd-monitor" \
--confidence high \
--session readme-demo
$LEDGER hypothesis "OOM killer terminated the process" \
--source "dmesg output" \
--confidence medium \
--session readme-demo
$LEDGER show --session readme-demoevidence-ledger is the headline package: every fact carries a source, every hypothesis lives separately from facts, rejected hypotheses stay visible, unknowns are acknowledged. The CLI is one of three surfaces; there's also a typed library API (@lannguyensi/evidence-ledger) and a JSON-RPC server (grounding-mcp) that any MCP client can call. Entries land in ~/.evidence-ledger/ledger.db; per-session isolation keeps demo data out of your real debugging sessions.
β Fact recorded:
β [#26] process is not running (ps aux | grep clawd-monitor) HIGH
? Hypothesis added:
? [#27] OOM killer terminated the process (dmesg output) MED
π Evidence Ledger β session: readme-demo
2 entries total
β FACTS (1)
β [#26] process is not running (ps aux | grep clawd-monitor) HIGH
? HYPOTHESES (1)
? [#27] OOM killer terminated the process (dmesg output) MED
(Entry IDs autoincrement globally across sessions, so your numbers will differ.)
Same data via ledger export --session readme-demo produces structured JSON for hand-off to another agent or a human. Same data via grounding-mcp's ledger_summary verb is what harness explain --trace and harness audit consume to replay policy decisions; see the harness integration for the wiring.
Every package is published under the @lannguyensi/ scope and installable directly:
# Library APIs
npm install @lannguyensi/evidence-ledger
npm install @lannguyensi/claim-gate
npm install @lannguyensi/hypothesis-tracker
npm install @lannguyensi/runtime-reality-checker
npm install @lannguyensi/grounding-wrapper
npm install @lannguyensi/grounding-sdk
npm install @lannguyensi/review-claim-gate
# CLIs (install globally to expose the bin)
npm install -g @lannguyensi/debug-playbook-engine # β debug-playbook
npm install -g @lannguyensi/domain-router # β domain-router
npm install -g @lannguyensi/readme-first-resolver # β readme-first
npm install -g @lannguyensi/understanding-gate # β understanding-gate
# MCP server (install globally or invoke via npx)
npm install -g @lannguyensi/grounding-mcp # β grounding-mcpThe git clone workflow above is for hacking on the monorepo itself; downstream consumers install only what they need from npm.
| If you want to... | Read |
|---|---|
| Track facts / hypotheses / rejected ideas / unknowns during a debugging session | packages/evidence-ledger |
| Block strong claims until evidence backs them | packages/claim-gate |
| Manage competing hypotheses and require evidence to switch between them | packages/hypothesis-tracker |
| Compare actual runtime state against documentation | packages/runtime-reality-checker |
| Guide an agent through a domain-specific diagnostic sequence | packages/debug-playbook-engine |
| Force an agent to read primary docs before any analysis | packages/readme-first-resolver |
| Route a keyword to the right repos / components / docs scope | packages/domain-router |
Use a single ergonomic facade (verify / track / validate) over the stack |
packages/grounding-sdk |
Gate merge_approval on tests + checklist + evidence-ledger entry |
packages/review-claim-gate |
| Ask agents to produce an Understanding Report before acting | packages/understanding-gate |
| Wire the stack into an MCP-speaking client (Claude Code, Codex, OpenCode) | packages/grounding-mcp |
| Orchestrate the stack β enforce correct tool order | packages/grounding-wrapper |
| Package | Description |
|---|---|
| understanding-gate | Asks agents to produce an Understanding Report before acting. Phase 1 shipped (parser, persistence, claude-code Stop hook + opencode plugin, hypothesis-tracker bridge); published as @lannguyensi/understanding-gate |
| Package | Description |
|---|---|
| runtime-reality-checker | Compares actual runtime state against documentation |
| claim-gate | Blocks strong claims without verified evidence |
| hypothesis-tracker | Tracks competing hypotheses, requires evidence to switch |
| Package | Description |
|---|---|
| debug-playbook-engine | Guides agents through domain-specific diagnostic sequences |
| evidence-ledger | Structured evidence tracking during debugging |
| grounding-wrapper | Orchestrates the grounding stack β enforces correct tool order |
| readme-first-resolver | Forces agents to read primary docs before any analysis |
| domain-router | Routes keywords to correct repos, components and docs scope |
| Package | Description |
|---|---|
| grounding-sdk | verify/track/validate β ergonomic in-process facade over the stack |
| review-claim-gate | merge_approval gate for PR-review subagents β fails closed unless tests pass, the checklist is complete, and β₯1 evidence-ledger entry exists |
| Package | Description |
|---|---|
| grounding-mcp | JSON-RPC MCP server that exposes ledger_add / ledger_summary / claim_evaluate_from_session to any MCP-speaking client |
AI agents are good at generating plausible explanations. They're bad at verifying them. This framework enforces discipline:
- Don't assume β check runtime state before diagnosing.
- Don't claim β gate strong assertions behind evidence.
- Don't forget β track all hypotheses, don't silently drop them.
- Don't skip steps β follow diagnostic playbooks in order.
- Don't guess scope β route to the correct domain first.
The motivating incident lives in an internal logbook: an agent investigated two agent-grounding tasks against a checkout that was 16 commits behind origin, declared both "stale" because the relevant directories didn't exist locally, and only caught the drift hours later when a third task forced a fresh git pull. Two corrections had to be walked back. The check that would have caught it (git fetch && git status before any structural claim) is exactly what runtime-reality-checker + claim-gate enforce β given a runtime that consults them.
Experimental β functional tools with tests, APIs may evolve. Each package has its own README with install + usage; this top-level README is a routing index. Build/contribution notes live in CONTRIBUTING.md.
agent-grounding is the Validate stage of the Project OS Human-Agent Dev Lifecycle:
- agent-planforge plans
- agent-tasks coordinates
- agent-grounding verifies
- agent-preflight gates pushes
- harness declares + enforces the policy boundary that calls into all of the above