Declarative control plane for agent harnesses.
One zod-validated YAML manifest for grounding, tools, memory, hooks, policies, and workflows, plus a CLI that describes, validates, diffs, applies, audits, and enforces.
Most config tools tell you what an agent is configured to use.
harnesstells you what an agent is allowed to do, under this exact context, and why.
harness collapses the six-to-eight surfaces a working agent harness
leaks across (settings.json, CLAUDE.md, memory frontmatter, MCP
registrations, per-project overrides, hook scripts) into a single
source of truth. Today (v0.7.0) policies fire end-to-end: a
mcp__agent-tasks__pull_requests_merge call against a session
without a review:${PR_NUMBER} ledger entry refuses; harness explain review-before-merge --trace shows exactly why.
flowchart LR
declare["1. Declare<br/><code>harness.yaml</code>"]
apply["2. Apply<br/><code>harness apply</code>"]
enforce["3. Enforce<br/>hooks + policies<br/>at runtime"]
record[("4. Record<br/>evidence ledger")]
observe["5. Observe<br/><code>audit</code> / <code>explain</code> /<br/><code>session-export</code>"]
declare --> apply
apply --> enforce
enforce --> record
record --> observe
observe -. refine .-> declare
One manifest declares grounding, tools, memory, hooks, policies, and
workflows. apply materialises that into the files Claude Code
actually reads. At runtime, hooks and policies enforce the contract
and write decision rows to the evidence ledger. The read-side
surfaces (audit, explain --trace, session-export) replay those
rows so you can see what fired, why, and across which session.
Whatever you learn from observing flows back into the manifest. That
loop is the whole product.
- Operator? Read
docs/for-humans.md. It walks fromnpm i -g @lannguyensi/harnessthrough your firstapply, your first real policy, and the diagnostics cheat sheet. - Agent (or onboarding one)? Read
docs/for-agents.md. It defines the workflow lifecycle, the policy / ledger sequence, the CLI cheat sheet split by side-effect class, and the audit triumvirate (auditvsexplain --tracevssession-export).
npm i -g @lannguyensi/harnessThe CLI binary is harness. Node 20 or newer required.
git clone https://github.com/LanNguyenSi/harness && cd harness
npm install && npm run build
node dist/cli/main.js dry-run "merge PR 42" \
--tool mcp__agent-tasks__pull_requests_merge \
--tool-args '{"prNumber":42}' \
--config docs/examples/full-manifest.yamldry-run reads the reference manifest, runs the trigger matcher,
substitutes ${PR_NUMBER}=42 through the JSONPath-restricted extract
DSL, and tells you exactly which hooks would fire and which policies
would match, before any ledger I/O.
- Phase 1, read-only inventory (
describe,validate,doctor,list,explain,diff), released asv0.1.0. - Phase 2, managed edits (
init,add,remove,adopt,export), released asv0.2.0. - Phase 3, declarative truth (
apply,diff --since-apply,harness.lock), released asv0.3.0. - Phase 4, policy layer (
policy intercept,explain --trace,audit,dry-run, requires-evaluator + extract DSL + grounding-mcp adapter), released asv0.4.0. - Phase 5, polish + dogfood lessons (
--verbosepolicy diagnostics,$CLAUDE_SESSION_IDenv fallback, server-sideauditfilter pushdown,policy_decisionfirst-class entry type, npm distribution as@lannguyensi/harness), released asv0.5.0. - Apply-into-settings cycle,
harness adopt,apply --target / --merge,harness.locktarget tracking, released asv0.6.0. - Workflows-as-data + full-session audit forensics: additive
workflows:/review_templates:/audit.redact[]manifest blocks,harness session-export,explain --last, audience- specific docs surfaces, released asv0.7.0. - Phase 6, Understanding Gate Policy Pack: agents must expose and confirm task understanding before write-capable tools fire.
- Phase 7, Risk Gate: Action Envelope + Risk Classifier +
allow / warn / require_approval / denyfor destructive-action prevention.
Two structurally larger themes are queued after Phase 5's polish.
Phase 6, Understanding Gate. Before an agent edits files, runs
shell, commits, or opens a PR, it must produce an Understanding
Report (its interpretation of the task: derived todos, acceptance
criteria, assumptions, out-of-scope, risks). The user confirms,
corrects, or "grills me until precise enough". Only after explicit
approval is recorded in the evidence ledger may write-capable tools
fire. Ships as the first harness Policy Pack: a reusable bundle
of instruction template + hooks + policies + permission profiles.
Phase 7, Risk Gate. Today's policy model evaluates a rule per
matching trigger and returns a binary block/allow. Phase 7 makes
harness reason about the action itself: an Action Envelope (tool +
raw input + session + runtime context) is enriched by a Context
Resolver (production / staging / dev / unknown), classified by a Risk
Classifier (severity + categories + reversibility), then matched
against policies whose when: clauses can reference
risk.severity_at_least, environment.name, and similar. The
decision space extends to allow / warn / require_approval / deny.
Motivating use case: prevent DROP TABLE users, kubectl delete namespace prod, terraform destroy against an unverified production
target, even if the model would have happily run them.
Both build on Phase 4's policy intercept runtime backbone; neither
replaces it.
Bring your favorite agent harness. Add governance.
A working agent harness today has six to eight configuration
surfaces, each with its own schema and lifecycle: ~/.claude/settings.json,
CLAUDE.md (per repo + root), ~/.claude/projects/*/memory/*.md
with frontmatter, ~/.claude/keybindings.json, MCP server
registrations in ~/.claude.json, skill directories, per-project
overrides, and external CLIs that behave differently per project.
There is no single place that answers "what can this agent do right now, and why is that configured that way?". Drift between sessions is invisible until it breaks something. Humans editing one surface do not know which other surfaces they need to touch. A fresh agent instance has no way to audit its own setup.
Our entry point into this problem: on 2026-04-23, an
agent-grounding checkout that was 16 commits behind origin led two
tasks to be incorrectly called "stale". The check that would have
caught it already exists,
agent-preflight
runs git fetch + git status (alongside lint, typecheck, test,
audit) and emits a structured ready + confidence-score result. The
missing piece was not the check itself, it was the deterministic
trigger: a SessionStart hook that invokes preflight run and a
policy that gates further work on the result. Building that wiring
needs an agreed-upon place for harness config to live first. That
conversation is the origin of this repo.
agent-grounding: grounding primitives (evidence-ledger, claim-gate, review-claim-gate);grounding-mcpis the canonical client surface harness queries throughqueryLedgerByTag.agent-memory: memory surfaces the control plane inventories.agent-tasks: the MCP-registered task platform whose registration + health appear inharness describe.agent-preflight: local preflight validator; the canonical implementation of preflight-hook content harness wires.codebase-oracle: one of the MCP surfaces being registered.agent-dx: shipsgit-batch-cli, a day-to-day tool whose inventory appears inharness describe.
MIT, see LICENSE.