Skip to content

davccavalcante/automend

AutoMend NPM

status: stable license version node tests coverage runtime deps

AutoMend

Star History Chart

The auto-pilot recovery layer for agents. A zero-runtime-dependency TypeScript library and CLI that scores output confidence, detects ungrounded claims with GSAR-style typed grounding, monitors behavioral drift, and decides a recovery action the moment an agent leaves the expected path.

An agent started inventing product IDs that do not exist. It took three days to notice, because the output looked plausible, and by then customers had the problem. AutoMend exists to catch that the moment it happens, and to act.

AutoMend sits around an agent step. You feed it the signals you already have (a self-reported confidence, claims classified against your evidence, behavioral metrics, a tool result), and it turns detect, decide, act into one deterministic, auditable loop: score the output, detect the failure mode, decide a recovery (rollback, retry with adjusted guardrails, escalate, or ask a human), run it through your executors, and seal every decision in a tamper-evident audit trail.

Honest by design. AutoMend is a deterministic policy and scoring engine, not a model. It does not call an LLM, and it does not decide on its own whether a claim is true. It scores the classifications and signals you supply, makes the recovery decision repeatable and auditable, and ships heuristic detectors as a starting point you can replace with a real matcher. Zero required runtime dependencies, node-free core, ESM + CJS, SLSA provenance on every release.


Install

pnpm add @takk/automend
# or: npm install @takk/automend
# or: yarn add @takk/automend
# or: bun add @takk/automend

The core has zero required runtime dependencies. Optional peers are sibling @takk packages, installed only if you bridge to them.


Quickstart

import { createAutoMend } from '@takk/automend';

const automend = createAutoMend();

const report = await automend.guard({
  // confidence signals you already measure, each in [0, 1]
  confidence: [
    { name: 'self-reported', value: 0.32 },
    { name: 'evidence-coverage', value: 0.4, weight: 2 },
  ],
  // claims classified against your evidence (by a model, a matcher, or a human)
  claims: [
    { id: 'c1', type: 'grounded', evidenceType: 'observed' },
    { id: 'c2', type: 'contradicted', evidenceType: 'observed' },
  ],
  // recovery actions are your callbacks; AutoMend decides which one to run
  executors: {
    retry: () => regenerateWithGuardrails(),
    escalate: () => pageTheOncall(),
    askHuman: () => openReviewTask(),
  },
});

if (!report.healthy) {
  console.log(report.decision?.strategy); // e.g. "escalate"
}

guard scores the confidence, runs the GSAR grounding math over your claims, picks the most severe issue, decides a recovery strategy under a safe-mode policy, runs the matching executor, and records every step in the audit trail.


Detect without a model, out of the box

The detectors are heuristic and dependency-free. They turn raw output into the classifications the scorers consume, so the loop works before you wire in anything smarter.

import { classifyClaims, detectLoop, detectCorruption, loopIssue } from '@takk/automend/detectors';

// lexical grounding matcher: text + evidence -> classified claims
const claims = classifyClaims(
  [{ id: 'c1', text: 'the order shipped on monday' }],
  ['observation: the order shipped on monday from the warehouse'],
); // -> [{ id: 'c1', type: 'grounded', ... }]

// loop / recursion detector over step fingerprints
const loop = detectLoop(['search', 'search', 'search']); // -> { looping: true, ... }

// feed a detected loop straight into guard
await automend.guard({ issues: [loopIssue(loop)].filter(Boolean), executors });

// output corruption (empty, control-character-laden, or validator-failing)
detectCorruption('{not json', { validate: (t) => { try { JSON.parse(t); return true; } catch { return false; } } });

Swap any detector for a real natural-language inference model or an embedding matcher when you need higher fidelity; the rest of the loop is unchanged.


A tamper-evident audit trail

Every detection, decision, and outcome is recorded append-only and sealed with a SHA-256 hash chain via the Web Crypto API. This supports the immutable execution-record requirement that regulations such as EU AI Act Article 12 ask of high-risk systems. It is an integrity seal, not a digital signature: it proves the log was not altered after sealing.

import { createAuditLog } from '@takk/automend/audit';

const log = createAuditLog({ id: 'run-42' });
log.append('detection', 'low confidence', { score: 0.31 });
log.append('decision', 'escalate');

const seal = await log.seal();           // { algorithm: 'sha-256', root: '...', count: 2 }
(await log.verify(seal)).valid;          // true; flips to false if any entry is altered

The createAutoMend facade wires this in automatically: read automend.audit for the live log.


Components

Each is importable from its own subpath, or from the root.

Subpath What it does
@takk/automend createAutoMend().guard(), the unified detect, decide, act loop
@takk/automend/confidence Aggregate weighted signals into a confidence and a verdict
@takk/automend/grounding GSAR typed grounding: four-way claims, asymmetric scoring, three-tier decision
@takk/automend/detectors Heuristic grounding matcher, loop detector, corruption detector
@takk/automend/drift Welford baseline and z-score drift detection
@takk/automend/recovery Ordered recovery policy, decide and run, safe mode, retry budget
@takk/automend/escalation Immutable, content-addressed escalation records
@takk/automend/audit Append-only audit log with a SHA-256 seal
@takk/automend/interceptors guardStep to wrap any function, deterministic clock
@takk/automend/mcp Turn a failed MCP tool call into a recovery trigger
@takk/automend/edge The full node-free core for edge runtimes and the browser

Recovery policy

decideRecovery evaluates an ordered policy, first matching rule wins, then applies two safety overrides:

  • A retry whose attempts reached maxRetries becomes an escalate (no infinite loops).
  • In safeMode (default on), any automatic strategy on a high or critical issue becomes an escalate (no silent auto-acting on serious failures).
import { decideRecovery, DEFAULT_RECOVERY_POLICY } from '@takk/automend/recovery';

const decision = decideRecovery(
  { kind: 'contradicted', severity: 'high' },
  DEFAULT_RECOVERY_POLICY,
);
// decision.strategy === 'escalate' (safe mode escalates high-severity issues)

The default policy: contradictions roll back, ungrounded and low-confidence outputs retry, high-severity drift escalates, tool errors retry, loops roll back, everything else escalates.


CLI

# score confidence signals from a JSON file
npx automend score signals.json        # { "signals": [{ "name": "self", "value": 0.9 }] }

# assess typed grounding for a claims file
npx automend assess claims.json        # { "claims": [{ "id": "1", "type": "grounded" }] }

# inspect and verify an audit log
npx automend inspect audit-log.json
npx automend verify audit-log.json audit-seal.json

Exit codes follow sysexits: 0 ok, 1 verify failed, 64 usage error, 65 bad data, 66 unreadable input.


Where it fits

AutoMend Post-hoc eval (LangSmith, Braintrust, Langfuse) Guardrail libraries
When it runs In-line, real time After the run In-line
Acts on failure Yes, recovery orchestration No, analysis only Blocks, but no recovery
Audit trail Tamper-evident SHA-256 seal Hosted logs Varies
Runtime dependencies Zero Hosted SDK Varies
Calls a model No Yes (judges) Sometimes

AutoMend is the deterministic decision and audit layer. It does not replace your evals or your model-based judges; it turns their signals into a repeatable, auditable recovery decision in the hot path.


Honest limits

  • AutoMend does not detect hallucinations by itself. It scores the classifications you supply; the built-in detectors are lexical heuristics, not a model.
  • The audit seal is tamper-evident, not a digital signature; pair it with your own signing for non-repudiation.
  • "Self-healing" means AutoMend automates the detect, decide, act loop you wire up. The repair actions are your executors.
  • Determinism is by policy: same inputs, same decision. AutoMend cannot make a sampled model deterministic.

Quality

  • 112 tests across 14 suites, all passing under Vitest 4, green on Node 20, 22, and 24.
  • Coverage: lines 93.8%, statements 93.7%, functions 98.7%, branches 88.0%.
  • Lint clean under Biome 2.
  • Typecheck clean under TypeScript 6 in maximum strict mode (exactOptionalPropertyTypes, useUnknownInCatchVariables, noUncheckedIndexedAccess).
  • publint clean, are-the-types-wrong clean across all eleven entry points.
  • Zero runtime dependencies; the core entry point is about 4.2 kB brotli, enforced by size-limit.
  • A distribution smoke test spawns the built CLI as a single Node process and exercises the ESM and CJS artifacts.
  • Published with --provenance (SLSA attestation by GitHub Actions).

See SPEC.md for the formal specification, public surface, and stability promise.


FAQ

Does it actually heal the agent? It automates the loop: it detects the failure, decides a recovery, and calls the executor you provided (retry, rollback, escalate, ask a human). The repair action is your code; AutoMend makes the decision deterministic and auditable.

Does it detect hallucinations on its own? No. It scores the grounding classifications you supply. The built-in classifyClaims is a lexical heuristic to get you started; for production fidelity, classify with a real natural-language inference model and feed AutoMend the result.

Does it call a model or the network? Never. AutoMend makes zero outbound calls. It is a deterministic, in-process engine.

Does it work in Cloudflare Workers, Vercel Edge, Bun, Deno, or the browser? Yes. The core and the ./edge entry point are node-free; the audit seal uses Web Crypto SHA-256, available in all of them.

How is this different from post-hoc evaluation tools? Those analyze runs after the fact. AutoMend runs in-line and acts on the failure with a recovery decision, recorded in a tamper-evident audit trail.

What is on the roadmap? Streaming capture, OpenTelemetry and observability exporters, Ed25519 signing on the audit seal, native bridges to sibling @takk packages, and input-matched recovery for concurrent fan-out. All additive; the 1.0.0 API is stable.


Contributing

See .github/CONTRIBUTING.md for the contributor guide. Substantive proposals open a GitHub Issue first; trivial fixes can go straight to a PR. All commits require DCO sign-off (git commit -s). Non-trivial contributions are governed by the Contributor License Agreement.

Community and support

  • Issues and feature requests. Open a GitHub issue at davccavalcante/automend/issues. Include the package version, a minimal reproduction, expected vs actual behaviour, and the relevant audit entries where applicable.
  • Security disclosures. Do NOT open public issues for vulnerabilities. Follow the responsible-disclosure flow in SECURITY.md, contact davcavalcante@proton.me (or say@takk.ag) with the [SECURITY] prefix.
  • Code of Conduct. This project follows the Contributor Covenant 2.1. Participation in any AutoMend space (issues, PRs, discussions) implies agreement.
  • Contributions. All non-trivial contributions go through the Contributor License Agreement. Tests, lint, typecheck, and build must be green before review (pnpm verify).

Author

Created by David C Cavalcante, davcavalcante@proton.me (preferred), say@takk.ag (Takk relay), linkedin.com/in/hellodav, x.com/davccavalcante, takk.ag

AutoMend is the reliability tier of a broader portfolio of NPM packages targeting Massive Intelligence (IM) and non-human entity (NHE) infrastructure for 2026-2030, built at Takk Innovate Studio.


Related research by the author

The architectural philosophy behind AutoMend, separating detection, decision, and audit into composable, independently-governed layers, echoes the author's research frameworks:

  • MAIC (Massive Artificial Intelligence Consciousness), the universe, the framework: a systemic intelligence framework to coordinate, supervise, and govern large-scale Massive Intelligence (IM) ecosystems, providing global context awareness, alignment, and orchestration across models, agents, and decision layers.
  • HIM (Hybrid Entity Intelligence Model), the spirit, the model: a hybrid intelligence layer that integrates Massive Intelligence (IM) systems with human-defined logic, rules, and strategic intent, interpreting objectives and structuring decision-making before and after execution.
  • NHE (Noumenal Higher-order Entity), the reincarnated body, the agent: a non-human entity with a defined functional identity and operational agency within an intelligence ecosystem, operating through coordinated layers while maintaining a non-anthropomorphic identity.

These frameworks are published independently of AutoMend and are separate works:


Sponsors

Join the journey as the portfolio continues to ship Massive Intelligence (IM) infrastructure. Your support is the cornerstone of this work.


Privacy

AutoMend runs entirely inside your own process and infrastructure. It makes no outbound calls, collects no telemetry, and ships no analytics. See PRIVACY.md for the full data-handling notice, including how the audit trail records only what you hand it.


License

Licensed under the Apache License 2.0. See LICENSE for the full text and NOTICE for attribution and third-party component licenses. You may use, modify, and distribute the code under the terms of that license, including its patent grant and attribution requirements.

About

Self-healing and drift recovery for agents. A zero-runtime-dependency TypeScript library and CLI that scores output confidence, detects ungrounded claims with GSAR-style typed grounding, monitors behavioral drift, and orchestrates recovery (rollback, retry, escalate, ask a human), with a tamper-evident audit trail.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

Packages

 
 
 

Contributors