The open-source cure for the AI era's biggest execution gaps.
Most agent stacks ship prompts, tools, and hopes. We build the missing layer underneath — a model-agnostic substrate of behavior rules, algorithmic primitives, and purpose-built plugins that turn AI agents from impressive demos into durable production systems.
Every team building agents rediscovers the same execution failures:
- "Why did the agent push to main even though we said not to?" — instruction attenuation in long contexts.
- "Why does it keep refactoring code I didn't ask it to touch?" — task drift, no surgical-changes rule.
- "Why is the trust score swinging so wildly?" — Beta-Bernoulli with no prior; one observation flips the verdict.
- "Why is the same bug recurring across sessions?" — no precedent log; the agent forgets what failed last week.
- "Why did the test suite pass but the migration break in prod?" — self-certification, no independent verification.
- "Why are two of our agents working at cross-purposes?" — inter-agent misalignment, no shared conduct surface.
The fixes are well-known to people who've shipped agents at scale. They're scattered across blog posts, internal docs, and folklore. Enchanter Labs consolidates them into a coherent open-source ecosystem.
| Repo | What it is |
|---|---|
| vis | Conduct-as-code. 19 behavior modules, 12 algorithmic engines, 21-code failure taxonomy, runbooks, and adoption recipes. The behavioral contract every plugin compiles against. |
| beholder | The foundational SDK. TypeScript MCP client with a hybrid orchestrator, 7-phase request lifecycle, and an event bus that lets plugins observe, modify, or block tool calls before they leave your process. |
| Repo | Gap it closes |
|---|---|
| crow | Bayesian change-trust scoring. Flags exactly which AI edits need human review. |
| hydra | Real-time threat interception. Blocks CVE-mapped attacks, poisoned configs, and destructive terminal commands before execution. |
| lich | Algorithm-backed code review. Sandbox-tests AI-generated code to mathematically prove it works before merge. |
| sylph | Zero-touch Git orchestration. Parallel branches, segmented tasks, Conventional Commits, clean PRs. |
| djinn | Cure AI memory loss. Pins session intent, prevents context drift, keeps long-running agents focused on the original goal. |
| emu | Real-time context optimization. Infinite-loop detection, smart prompt compression, stop burning tokens. |
| wixie | Self-healing prompt engineering. Test, converge, harden, and translate prompts across 225+ AI models. |
| pech | FinOps for autonomous agents. Per-tier token burn, exponential-smoothing forecasts, graceful degradation triggers. |
| gorgon | Static codebase intelligence. Maps structural hotspots and dependency cycles for safe navigation of enterprise architectures. |
| naga | Source-as-spec enforcement. Deep AST fingerprinting ensures generated code mirrors your internal coding standards. |
- Want the behavioral contract? Start with vis — drop-in for Claude Code, OpenAI Agents SDK, Cursor, LangChain, Pydantic-AI, BAML, and bare system prompts.
- Building on MCP? beholder ships the runtime, orchestrator, and a terminal cockpit for live observability.
- Picking a single plugin? Each repo is independently installable. Find the gap closest to the pain you're feeling and start there.
- Honest numbers over rosy summaries. Verdicts carry sample sizes and calibration qualifiers. We do not inflate.
- Verification independent from generation. The system that produces a change does not get to certify it.
- Algorithms over vibes. Beta-Bernoulli, Wald SPRT, Zhang-Shasha, Aho-Corasick — every plugin name is a creature; every plugin core is math.
- Compounding learning. Failures get a code, codes get a runbook, runbooks update the substrate. Future sessions inherit the lesson.
Apache 2.0. Built in public.