Skip to content

LanNguyenSi/harness

harness

Declarative control plane for agent harnesses.

One zod-validated YAML manifest for grounding, tools, memory, hooks, policies, and workflows, plus a CLI that describes, validates, diffs, applies, audits, and enforces.

Most config tools tell you what an agent is configured to use. harness tells you what an agent is allowed to do, under this exact context, and why.

harness collapses the six-to-eight surfaces a working agent harness leaks across (settings.json, CLAUDE.md, memory frontmatter, MCP registrations, per-project overrides, hook scripts) into a single source of truth. Today (v0.7.0) policies fire end-to-end: a mcp__agent-tasks__pull_requests_merge call against a session without a review:${PR_NUMBER} ledger entry refuses; harness explain review-before-merge --trace shows exactly why.

What harness does

flowchart LR
    declare["1. Declare<br/><code>harness.yaml</code>"]
    apply["2. Apply<br/><code>harness apply</code>"]
    enforce["3. Enforce<br/>hooks + policies<br/>at runtime"]
    record[("4. Record<br/>evidence ledger")]
    observe["5. Observe<br/><code>audit</code> / <code>explain</code> /<br/><code>session-export</code>"]

    declare --> apply
    apply --> enforce
    enforce --> record
    record --> observe
    observe -. refine .-> declare
Loading

One manifest declares grounding, tools, memory, hooks, policies, and workflows. apply materialises that into the files Claude Code actually reads. At runtime, hooks and policies enforce the contract and write decision rows to the evidence ledger. The read-side surfaces (audit, explain --trace, session-export) replay those rows so you can see what fired, why, and across which session. Whatever you learn from observing flows back into the manifest. That loop is the whole product.

Pick your audience

  • Operator? Read docs/for-humans.md. It walks from npm i -g @lannguyensi/harness through your first apply, your first real policy, and the diagnostics cheat sheet.
  • Agent (or onboarding one)? Read docs/for-agents.md. It defines the workflow lifecycle, the policy / ledger sequence, the CLI cheat sheet split by side-effect class, and the audit triumvirate (audit vs explain --trace vs session-export).

Install

npm i -g @lannguyensi/harness

The CLI binary is harness. Node 20 or newer required.

Try it in 60 seconds

git clone https://github.com/LanNguyenSi/harness && cd harness
npm install && npm run build
node dist/cli/main.js dry-run "merge PR 42" \
  --tool mcp__agent-tasks__pull_requests_merge \
  --tool-args '{"prNumber":42}' \
  --config docs/examples/full-manifest.yaml

dry-run reads the reference manifest, runs the trigger matcher, substitutes ${PR_NUMBER}=42 through the JSONPath-restricted extract DSL, and tells you exactly which hooks would fire and which policies would match, before any ledger I/O.

Status

  • Phase 1, read-only inventory (describe, validate, doctor, list, explain, diff), released as v0.1.0.
  • Phase 2, managed edits (init, add, remove, adopt, export), released as v0.2.0.
  • Phase 3, declarative truth (apply, diff --since-apply, harness.lock), released as v0.3.0.
  • Phase 4, policy layer (policy intercept, explain --trace, audit, dry-run, requires-evaluator + extract DSL + grounding-mcp adapter), released as v0.4.0.
  • Phase 5, polish + dogfood lessons (--verbose policy diagnostics, $CLAUDE_SESSION_ID env fallback, server-side audit filter pushdown, policy_decision first-class entry type, npm distribution as @lannguyensi/harness), released as v0.5.0.
  • Apply-into-settings cycle, harness adopt, apply --target / --merge, harness.lock target tracking, released as v0.6.0.
  • Workflows-as-data + full-session audit forensics: additive workflows: / review_templates: / audit.redact[] manifest blocks, harness session-export, explain --last, audience- specific docs surfaces, released as v0.7.0.
  • Phase 6, Understanding Gate Policy Pack: agents must expose and confirm task understanding before write-capable tools fire.
  • Phase 7, Risk Gate: Action Envelope + Risk Classifier + allow / warn / require_approval / deny for destructive-action prevention.

What's next

Two structurally larger themes are queued after Phase 5's polish.

Phase 6, Understanding Gate. Before an agent edits files, runs shell, commits, or opens a PR, it must produce an Understanding Report (its interpretation of the task: derived todos, acceptance criteria, assumptions, out-of-scope, risks). The user confirms, corrects, or "grills me until precise enough". Only after explicit approval is recorded in the evidence ledger may write-capable tools fire. Ships as the first harness Policy Pack: a reusable bundle of instruction template + hooks + policies + permission profiles.

Phase 7, Risk Gate. Today's policy model evaluates a rule per matching trigger and returns a binary block/allow. Phase 7 makes harness reason about the action itself: an Action Envelope (tool + raw input + session + runtime context) is enriched by a Context Resolver (production / staging / dev / unknown), classified by a Risk Classifier (severity + categories + reversibility), then matched against policies whose when: clauses can reference risk.severity_at_least, environment.name, and similar. The decision space extends to allow / warn / require_approval / deny. Motivating use case: prevent DROP TABLE users, kubectl delete namespace prod, terraform destroy against an unverified production target, even if the model would have happily run them.

Both build on Phase 4's policy intercept runtime backbone; neither replaces it.

Bring your favorite agent harness. Add governance.

Why this exists

A working agent harness today has six to eight configuration surfaces, each with its own schema and lifecycle: ~/.claude/settings.json, CLAUDE.md (per repo + root), ~/.claude/projects/*/memory/*.md with frontmatter, ~/.claude/keybindings.json, MCP server registrations in ~/.claude.json, skill directories, per-project overrides, and external CLIs that behave differently per project.

There is no single place that answers "what can this agent do right now, and why is that configured that way?". Drift between sessions is invisible until it breaks something. Humans editing one surface do not know which other surfaces they need to touch. A fresh agent instance has no way to audit its own setup.

Our entry point into this problem: on 2026-04-23, an agent-grounding checkout that was 16 commits behind origin led two tasks to be incorrectly called "stale". The check that would have caught it already exists, agent-preflight runs git fetch + git status (alongside lint, typecheck, test, audit) and emits a structured ready + confidence-score result. The missing piece was not the check itself, it was the deterministic trigger: a SessionStart hook that invokes preflight run and a policy that gates further work on the result. Building that wiring needs an agreed-upon place for harness config to live first. That conversation is the origin of this repo.

Related

  • agent-grounding: grounding primitives (evidence-ledger, claim-gate, review-claim-gate); grounding-mcp is the canonical client surface harness queries through queryLedgerByTag.
  • agent-memory: memory surfaces the control plane inventories.
  • agent-tasks: the MCP-registered task platform whose registration + health appear in harness describe.
  • agent-preflight: local preflight validator; the canonical implementation of preflight-hook content harness wires.
  • codebase-oracle: one of the MCP surfaces being registered.
  • agent-dx: ships git-batch-cli, a day-to-day tool whose inventory appears in harness describe.

License

MIT, see LICENSE.

About

Declarative control plane for agent harnesses: one YAML for grounding, tools, memory, and hooks. Describe, validate, diff, apply.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors