diff --git a/.gitignore b/.gitignore index 80c7ce7..90192a8 100644 --- a/.gitignore +++ b/.gitignore @@ -177,3 +177,6 @@ dist #workflow review /todos +# Local reference repos (symlinked by contributors) +/references + diff --git a/AGENTS.md b/AGENTS.md index d656162..6a12b20 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,658 +1,297 @@ -# AGENTS.md - Using bashkit +# BashKit — Contributor Guide -bashkit provides agentic coding tools for the Vercel AI SDK. This guide helps AI agents use bashkit when building applications. +> Agentic coding tools for the Vercel AI SDK. -## Installation +**Tech Stack**: TypeScript · Bun · Vercel AI SDK · Zod +**Package**: `bashkit` ([npm](https://www.npmjs.com/package/bashkit) · [GitHub](https://github.com/jbreite/bashkit)) -```bash -npm install bashkit ai @ai-sdk/anthropic -# or -pnpm add bashkit ai @ai-sdk/anthropic -# or -yarn add bashkit ai @ai-sdk/anthropic -# or -bun add bashkit ai @ai-sdk/anthropic -``` - -## Quick Setup - -### LocalSandbox (Development) - -Runs commands directly on the local machine. Use for development/testing only. - -```typescript -import { createAgentTools, createLocalSandbox } from "bashkit"; - -const sandbox = createLocalSandbox({ cwd: "/tmp/workspace" }); -const { tools } = createAgentTools(sandbox); -``` - -### VercelSandbox (Production) - -Runs in isolated Firecracker microVMs on Vercel's infrastructure. - -```typescript -import { createAgentTools, createVercelSandbox } from "bashkit"; - -// Async - automatically installs ripgrep for Grep tool -const sandbox = await createVercelSandbox({ - runtime: "node22", - resources: { vcpus: 2 }, - // ensureTools: true (default) - auto-setup ripgrep - // ensureTools: false - skip for faster startup if you don't need Grep -}); -const { tools } = createAgentTools(sandbox); - -// Don't forget to cleanup -await sandbox.destroy(); -``` - -### E2BSandbox (Production) - -Runs in E2B's cloud sandboxes. Requires `@e2b/code-interpreter` peer dependency. - -```typescript -import { createAgentTools, createE2BSandbox } from "bashkit"; - -// Async - automatically installs ripgrep for Grep tool -const sandbox = await createE2BSandbox({ - apiKey: process.env.E2B_API_KEY, - // ensureTools: true (default) - auto-setup ripgrep - // ensureTools: false - skip for faster startup if you don't need Grep -}); -const { tools } = createAgentTools(sandbox); - -await sandbox.destroy(); -``` - -### Sandbox Reconnection (Cloud Sandboxes) - -Cloud sandboxes (E2B, Vercel) support reconnection via the `id` property and `sandboxId` config: - -```typescript -// Create a new sandbox -const sandbox = await createE2BSandbox({ apiKey: process.env.E2B_API_KEY }); - -// Sandbox ID is available immediately after creation -const sandboxId = sandbox.id; // "sbx_abc123..." - -// Store sandboxId in your database (e.g., chat metadata) -await db.chat.update({ where: { id: chatId }, data: { sandboxId } }); - -// Later: reconnect to the same sandbox (fast - ripgrep already installed) -const savedId = chat.sandboxId; -const reconnected = await createE2BSandbox({ - apiKey: process.env.E2B_API_KEY, - sandboxId: savedId, // Reconnects instead of creating new -}); -``` +This file is for **agents and humans working ON bashkit**. For consumer-facing API usage (how to *use* bashkit in an app), see `README.md`. For folder-specific internals, see the `AGENTS.md` inside each `src/*` directory. -This is useful for: -- Reusing sandboxes across multiple requests in the same conversation -- Persisting sandbox state between server restarts -- Reducing sandbox creation overhead +> **Before editing anything inside `src//`, read `src//AGENTS.md` first.** Every folder has one. They document internal file layout, key exports, data flows, and per-task modification steps. This root file intentionally does not duplicate them — if you only read this file, you are missing half the picture. -## Internal Architecture - -For developers working on bashkit internals, each source folder has its own `AGENTS.md`: - -- `src/sandbox/AGENTS.md` -- Execution environment abstractions -- `src/tools/AGENTS.md` -- Tool implementations -- `src/cache/AGENTS.md` -- Tool result caching -- `src/middleware/AGENTS.md` -- AI SDK middleware -- `src/utils/AGENTS.md` -- Utility functions -- `src/skills/AGENTS.md` -- Agent Skills support -- `src/setup/AGENTS.md` -- Environment setup -- `src/cli/AGENTS.md` -- CLI initialization - -See also `CLAUDE.md` for development workflow and conventions. - -## Available Tools - -### Default Tools (always included) - -| Tool | Purpose | Key Inputs | -|------|---------|------------| -| `Bash` | Execute shell commands | `command`, `timeout`, `description` | -| `Read` | Read files or list directories | `file_path`, `offset`, `limit` | -| `Write` | Create/overwrite files | `file_path`, `content` | -| `Edit` | Replace strings in files | `file_path`, `old_string`, `new_string`, `replace_all` | -| `Glob` | Find files by pattern | `pattern`, `path` | -| `Grep` | Search file contents | `pattern`, `path`, `output_mode`, `-i`, `-C` | - -> **Note on nullable types:** Optional parameters use `T | null` (not `T | undefined`) for OpenAI structured outputs compatibility. AI models should send explicit `null` for parameters they don't want to set. This works with both OpenAI and Anthropic models. - -### Optional Tools (via config) - -| Tool | Purpose | Config Key | -|------|---------|------------| -| `AskUser` | Ask user clarifying questions | `askUser: true` | -| `EnterPlanMode` | Enter planning/exploration mode | `planMode: true` | -| `ExitPlanMode` | Exit planning mode with a plan | `planMode: true` | -| `Skill` | Execute skills | `skill: { skills }` | -| `WebSearch` | Search the web | `webSearch: { apiKey }` | -| `WebFetch` | Fetch URL and process with AI | `webFetch: { apiKey, model }` | +--- -### Workflow Tools (created separately) +## Core Principles -| Tool | Purpose | Factory | -|------|---------|---------| -| `Task` | Spawn sub-agents | `createTaskTool({ model, tools, subagentTypes? })` | -| `TodoWrite` | Track task progress | `createTodoWriteTool(state, config?, onUpdate?)` | +These apply to every PR, no exceptions: -### Web Tools (require `parallel-web` peer dependency) +1. **Fully typed.** No `any`. Use `unknown` at untrusted boundaries and narrow with guards. Public APIs must have explicit return types — don't rely on inference for exports. Tool input/output shapes live in Zod schemas + exported TypeScript interfaces that stay in sync. +2. **Testable and tested.** Every public export has a test. Tests mirror `src/` layout in `tests/`. Bug fixes include a regression test. If a change is hard to test, refactor until it isn't. +3. **Typecheck and lint before pushing.** `bun run typecheck && bun run check && bun run test` must be green locally. CI will reject otherwise. +4. **Return errors, don't throw.** Tools return `{ error: string }` objects so the model can see the failure. Only sandbox-layer code throws, and tools catch it. +5. **Config-driven, not flag-driven.** Optional features are enabled by the *presence* of a config object (e.g. `webSearch: { apiKey }`), not by boolean flags. Defaults live in factories via `config?.field ?? default`. +6. **No breaking changes without a major bump.** See the Breaking Change Surface section below before touching the `Sandbox` interface, tool schemas, tool names, `ContextLayer`, or `createAgentTools` return shape. +7. **Docs live next to code.** When you change files in a folder, update that folder's `AGENTS.md` in the same PR. -| Tool | Purpose | Factory | -|------|---------|---------| -| `WebSearch` | Search the web | `createWebSearchTool({ apiKey })` | -| `WebFetch` | Fetch URL and process with AI | `createWebFetchTool({ apiKey, model })` | +--- -## Using with AI SDK generateText +## References -```typescript -import { generateText, wrapLanguageModel, stepCountIs } from "ai"; -import { anthropic } from "@ai-sdk/anthropic"; -import { - createAgentTools, - createLocalSandbox, - anthropicPromptCacheMiddleware, -} from "bashkit"; +If a `references/` directory exists at the project root, search it for implementation patterns when building new features. It is gitignored — contributors symlink or clone repos locally. -const sandbox = createLocalSandbox({ cwd: "/tmp/workspace" }); -const { tools } = createAgentTools(sandbox); +- `references/codex` — OpenAI Codex CLI. Tool designs, agent loop, sandboxing patterns. +- `references/pi-mono` — pi-mono monorepo. See `packages/coding-agent` for agent loop patterns. -// Wrap model with prompt caching (recommended) -const model = wrapLanguageModel({ - model: anthropic("claude-sonnet-4-20250514"), - middleware: anthropicPromptCacheMiddleware, -}); +--- -const result = await generateText({ - model, - tools, - system: "You are a helpful coding assistant.", - prompt: "Create a hello world TypeScript file and run it", - stopWhen: stepCountIs(10), // Allow up to 10 tool-call rounds - onStepFinish: ({ finishReason, toolCalls, toolResults, usage }) => { - // Log progress - console.log(`Step finished: ${finishReason}`); - for (const call of toolCalls || []) { - console.log(` Tool: ${call.toolName}`); - } - }, -}); +## Code Organization -await sandbox.destroy(); ``` - -## Sub-agents with Task Tool - -The Task tool spawns new agents for complex subtasks: - -```typescript -import { createTaskTool } from "bashkit"; - -const taskTool = createTaskTool({ - model: anthropic("claude-sonnet-4-20250514"), - tools: sandboxTools, - subagentTypes: { - research: { - model: anthropic("claude-haiku-3"), // Cheaper model for research - systemPrompt: "You are a research specialist. Find information only.", - tools: ["Read", "Grep", "Glob"], // Limited tools - }, - coding: { - systemPrompt: "You are a coding expert. Write clean code.", - tools: ["Read", "Write", "Edit", "Bash"], - }, - }, -}); - -// Add to tools -const allTools = { ...sandboxTools, Task: taskTool }; +src/ +├── sandbox/ # Execution environments (Local, Vercel, E2B) — src/sandbox/AGENTS.md +├── tools/ # Tool implementations — src/tools/AGENTS.md +├── context/ # Prompt assembly + tool execution layers — src/context/AGENTS.md +├── cache/ # Tool result caching (LRU, Redis) — src/cache/AGENTS.md +├── middleware/ # AI SDK language model middleware — src/middleware/AGENTS.md +├── utils/ # Budget, compaction, context status, helpers — src/utils/AGENTS.md +├── skills/ # Agent Skills standard — src/skills/AGENTS.md +├── setup/ # Agent environment setup (sandbox bootstrapping) — src/setup/AGENTS.md +├── cli/ # CLI initialization — src/cli/AGENTS.md +├── types.ts # AgentConfig, ToolConfig, DEFAULT_CONFIG +└── index.ts # Barrel re-exports (public API surface) ``` -The parent agent calls Task like any other tool: -```typescript -// Agent decides to delegate: -{ tool: "Task", args: { - description: "Research API patterns", - prompt: "Find best practices for REST APIs", - subagent_type: "research" -}} -``` +**Each folder has its own `AGENTS.md`** with file listings, exports, internal architecture, and per-task modification guides. -### Streaming Sub-agent Activity to UI +### AGENTS.md Conventions (enforced in CI) -Pass a `streamWriter` to stream real-time sub-agent activity: +- Every folder under `src/` **must** have an `AGENTS.md`. When you add a new folder, add one. +- Every `AGENTS.md` (except the root) **must** have a co-located `CLAUDE.md` symlink pointing to it. +- Automation: `bun run link-agents` creates missing symlinks; `bun run check:agents` fails CI if any are missing. +- When you **add, remove, or significantly change** files in a folder, update that folder's `AGENTS.md` in the same PR. Stale folder docs are worse than no docs. -```typescript -import { createUIMessageStream } from "ai"; +--- -const stream = createUIMessageStream({ - execute: async ({ writer }) => { - const taskTool = createTaskTool({ - model, - tools: sandboxTools, - streamWriter: writer, // Enable real-time streaming - subagentTypes: { ... }, - }); +## Development Workflow - const result = streamText({ - model, - tools: { Task: taskTool }, - ... - }); +### Build & Typecheck - writer.merge(result.toUIMessageStream()); - }, -}); +```bash +bun install +bun run typecheck # ALWAYS run before bun run build +bun run build # Bun bundles to dist/index.js + tsc emits .d.ts ``` -When `streamWriter` is provided: -- Uses `streamText` internally (instead of `generateText`) -- Emits `data-subagent` events: `start`, `tool-call`, `done`, `complete` -- Events appear in `message.parts` as `{ type: "data-subagent", data: SubagentEventData }` +**Script names are exact — no hyphens.** It's `typecheck`, not `type-check`. Running the wrong name will just error with "Script not found". See `package.json` for the full list. -**Note:** TaskOutput does NOT include messages (to avoid context bloat). The UI accesses the full conversation via the streamed `complete` event. +**Critical**: `bun run build` does **not** fail on type errors during bundling. Run `bun run typecheck` first or type regressions will ship silently. -## Prompt Caching +### Full Pre-Push Check -Enable Anthropic prompt caching to reduce costs on repeated prefixes: +Before pushing, run all four gates locally — CI will reject otherwise: -```typescript -import { wrapLanguageModel } from "ai"; -import { anthropicPromptCacheMiddleware } from "bashkit"; - -const model = wrapLanguageModel({ - model: anthropic("claude-sonnet-4-20250514"), - middleware: anthropicPromptCacheMiddleware, -}); - -// Check cache stats in result -console.log({ - cacheCreation: result.providerMetadata?.anthropic?.cacheCreationInputTokens, - cacheRead: result.providerMetadata?.anthropic?.cacheReadInputTokens, -}); +```bash +bun run typecheck && bun run check && bun run test && bun run check:agents ``` -## Web Tools +Exact script names (from `package.json`): `typecheck`, `build`, `test`, `test:watch`, `test:coverage`, `format`, `format:check`, `lint`, `lint:check`, `check`, `check:ci`, `link-agents`, `check:agents`. -WebSearch and WebFetch tools provide web access capabilities using the [Parallel API](https://docs.parallel.ai). +### Tests -### Setup +Use Vitest via `bun run test` — **not** `bun test` (which runs Bun's built-in runner and will miss our suite). ```bash -# Install the parallel-web peer dependency -bun add parallel-web - -# Set your API key -export PARALLEL_API_KEY="your_api_key" +bun run test # all tests +bun run test tests/utils/budget.test.ts # single file +bun run test:watch # watch mode +bun run test:coverage # with coverage ``` -### WebSearch +Tests live in `tests//` mirroring `src//`. Examples in `/examples/` serve as integration tests and require sandbox/API-key env vars. -Search the web and get formatted results: +**Everything non-trivial ships with tests.** New tools, new context layers, new utilities, new sandbox methods — all get unit tests before merging. Bug fixes include a regression test that would have caught the bug. If you can't easily test something, that's a signal the abstraction is wrong, not a reason to skip the test. -```typescript -import { createWebSearchTool } from "bashkit"; +### Lint & Format -const webSearch = createWebSearchTool({ - apiKey: process.env.PARALLEL_API_KEY!, -}); - -// Add to your tools -const tools = { - ...sandboxTools, - WebSearch: webSearch, -}; -``` +Biome handles both: -**Input:** -- `query` - The search query -- `allowed_domains?` - Only include results from these domains -- `blocked_domains?` - Exclude results from these domains - -**Output:** -```typescript -{ - results: Array<{ title: string; url: string; snippet: string; metadata?: Record }>; - total_results: number; - query: string; -} +```bash +bun run check # lint + format, auto-fix +bun run check:ci # lint + format, no writes (CI gate) +bun run format # format only +bun run lint # lint only ``` -### WebFetch +Run `bun run check` before pushing. CI runs `check:ci`, `typecheck`, `test`, and `check:agents` — all four must pass. -Fetch a URL and process the content with an AI model: +### Commits & PRs -```typescript -import { createWebFetchTool } from "bashkit"; -import { anthropic } from "@ai-sdk/anthropic"; +- Commits are small, imperative, sentence-case: `Add budget tracking`, `Refactor AskUser tool to deferred client-rendered model`, `Fix lint and typecheck CI failures`. +- One logical change per commit. Keep refactors separate from feature work. +- PR titles follow the same style as commits. PR descriptions should explain *why*, link relevant issues, and call out any public API changes. +- CI gates: `typecheck`, `check:ci` (Biome), `test`, `check:agents`. All four must pass before merge. -const webFetch = createWebFetchTool({ - apiKey: process.env.PARALLEL_API_KEY!, - model: anthropic("claude-haiku-3"), // Use a fast/cheap model for processing -}); +### Local Iteration Loop -// Add to your tools -const tools = { - ...sandboxTools, - WebFetch: webFetch, -}; -``` +Use `LocalSandbox` (Bun APIs, no network) for fast iteration. Swap to `VercelSandbox` / `E2BSandbox` when you need to verify production behavior. -**Input:** -- `url` - The URL to fetch -- `prompt` - The prompt to run on the fetched content - -**Output:** -```typescript -{ - response: string; // AI model's response to the prompt - url: string; - final_url?: string; // Final URL after redirects - status_code?: number; -} +```bash +bun examples/test-tools.ts # direct tool calls, no AI +ANTHROPIC_API_KEY=xxx bun examples/basic.ts # full agentic loop ``` -## Agent Skills - -bashkit supports the [Agent Skills](https://agentskills.io) standard for progressive skill loading. - -> **Note:** Skill discovery is for **LocalSandbox** use cases where the agent has filesystem access. For cloud sandboxes, bundle skills with your app directly. - -### Discovering Skills (LocalSandbox) - -When using LocalSandbox, discover project and user-global skills: +--- -```typescript -import { discoverSkills, skillsToXml } from "bashkit"; +## Code Conventions -// Discovers from .skills/ (project) and ~/.bashkit/skills/ (user-global) -const skills = await discoverSkills(); -``` +### Naming -### Using Skills with Agents +| Element | Convention | Examples | +|---|---|---| +| Tool names | PascalCase | `Bash`, `Read`, `WebSearch` | +| Factories | `createX` | `createBashTool`, `createLocalSandbox` | +| Output types | `XOutput` | `BashOutput`, `ReadOutput` | +| Error types | `XError` | `BashError`, `ReadError` | +| Config types | `XConfig` | `ToolConfig`, `AgentConfig` | +| Files | kebab-case | `bash.ts`, `anthropic-cache.ts` | -```typescript -import { discoverSkills, skillsToXml, createAgentTools, createLocalSandbox } from "bashkit"; -import { generateText, stepCountIs } from "ai"; -import { anthropic } from "@ai-sdk/anthropic"; +### Type Organization -const skills = await discoverSkills(); -const sandbox = createLocalSandbox({ cwd: "/tmp/workspace" }); -const { tools } = createAgentTools(sandbox); +- **Input schemas**: colocated with tool implementation (`src/tools/bash.ts` defines `bashInputSchema`). +- **Output/Error types**: exported from the tool file; tools return `Output | Error` unions. +- **Config types**: centralized in `src/types.ts`. +- **Error handling**: tools **return** `{ error: string }` objects — they do not throw. Sandbox methods may throw; tools catch them. -const result = await generateText({ - model: anthropic("claude-sonnet-4-20250514"), - tools, - system: `You are a coding assistant. +### `.nullable()` over `.optional()` for tool inputs -${skillsToXml(skills)} +All optional tool parameters use `z.nullable()`, **not** `z.optional()`. OpenAI structured outputs require every property in the `required` array; `.optional()` removes them and breaks OpenAI. `.nullable()` keeps them required but allows `null`, and works on both Anthropic and OpenAI. -When a task matches a skill, use the Read tool to load its full instructions from the location path.`, - prompt: "Extract text from invoice.pdf", - stopWhen: stepCountIs(10), +```ts +const schema = z.object({ + timeout: z.number().nullable(), + replace_all: z.boolean().nullable(), }); -``` - -### How It Works - -1. `discoverSkills()` loads only metadata (name, description, path) - ~50-100 tokens per skill -2. `skillsToXml()` generates XML listing available skills -3. Agent decides when to activate a skill by reading its SKILL.md with the Read tool -4. Full instructions enter context only when the skill is actually used -### Creating Skills - -Create `.skills//SKILL.md`: - -```markdown ---- -name: pdf-processing -description: Extract text and tables from PDF files. ---- - -# PDF Processing - -Instructions for the agent... +// Destructuring defaults (= value) only fire on undefined, NOT null. +// Always use ?? for defaults with nullable fields: +const { timeout, replace_all: rawReplaceAll } = input; +const effectiveTimeout = timeout ?? 120000; +const replaceAll = rawReplaceAll ?? false; ``` -### Using Remote Skills - -Fetch complete skill folders from GitHub repositories (e.g., Anthropic's official skills): - -```typescript -import { fetchSkill, fetchSkills, setupAgentEnvironment } from "bashkit"; +### Configuration Pattern -// Fetch a single skill (gets all files: SKILL.md, scripts/, etc.) -const pdfSkill = await fetchSkill('anthropics/skills/pdf'); +Tool factories accept an optional `ToolConfig` and merge with defaults inline: -// Or batch fetch multiple -const remoteSkills = await fetchSkills([ - 'anthropics/skills/pdf', - 'anthropics/skills/web-research', -]); - -// Use with setupAgentEnvironment -const config = { - skills: { - ...remoteSkills, - 'my-custom': myContent, - }, -}; -const { skills } = await setupAgentEnvironment(sandbox, config); +```ts +export function createBashTool(sandbox: Sandbox, config?: ToolConfig) { + const timeout = config?.timeout ?? 120000; + // ... +} ``` -**Format:** `owner/repo/skillName` (fetches entire skill folder from GitHub) - -## Setting Up Agent Environments - -For cloud sandboxes, use `setupAgentEnvironment` to create workspace directories and seed skills: +Optional features (WebSearch, WebFetch, cache, budget, context layers) are enabled by **config presence** in `createAgentTools` — don't gate them on feature flags. -```typescript -import { setupAgentEnvironment, skillsToXml, createAgentTools, createVercelSandbox } from "bashkit"; +--- -const config = { - workspace: { - notes: 'files/notes/', - outputs: 'files/outputs/', - }, - skills: { - 'web-research': webResearchSkillContent, - }, -}; +## Core Abstractions -const sandbox = await createVercelSandbox({}); -const { skills } = await setupAgentEnvironment(sandbox, config); +### Sandbox Interface -// Use same config in prompt - stays in sync! -const systemPrompt = `Save notes to: ${config.workspace.notes} -${skillsToXml(skills)} -`; +All tools depend on `Sandbox` from `src/sandbox/interface.ts`, not concrete implementations. Adding a method is a breaking change for every implementer. -const { tools } = createAgentTools(sandbox); +```ts +interface Sandbox { + exec(command: string, options?: ExecOptions): Promise; + readFile(path: string): Promise; + writeFile(path: string, content: string): Promise; + readDir(path: string): Promise; + fileExists(path: string): Promise; + isDirectory(path: string): Promise; + destroy(): Promise; + readonly id?: string; // for cloud reconnection + rgPath?: string; // set by ensureSandboxTools +} ``` -## Common Patterns +`createVercelSandbox()` and `createE2BSandbox()` are **async** and auto-run `ensureSandboxTools` to install ripgrep so `Grep` works immediately. `createLocalSandbox()` is sync. -### Full Agent Setup +### Context Layer -```typescript -import { generateText, wrapLanguageModel, stepCountIs } from "ai"; -import { anthropic } from "@ai-sdk/anthropic"; -import { - createAgentTools, - createTaskTool, - createTodoWriteTool, - createLocalSandbox, - anthropicPromptCacheMiddleware, - type TodoState, -} from "bashkit"; +`src/context/` provides two separate concerns: -// 1. Create sandbox -const sandbox = createLocalSandbox({ cwd: "/tmp/workspace" }); +1. **Static system prompt assembly** (`buildSystemContext`) — discovers `AGENTS.md` / `CLAUDE.md` files, collects environment info (cwd, git branch, platform), builds tool guidance. Called **once at init**, must stay stable across turns for Anthropic prompt caching. +2. **Dynamic per-step layers** (`withContext`, `applyContextLayers`, `createExecutionPolicy`, `createOutputPolicy`) — intercept every tool call (`beforeExecute` gate, `afterExecute` transform). `createPrepareStep` composes compaction + context-status + plan-mode hints into an AI SDK `prepareStep` callback. -// 2. Create sandbox tools -const { tools: sandboxTools } = createAgentTools(sandbox); +Never mutate `system` from `prepareStep` — it will break prompt caching. Dynamic hints go in `messages` as user content. -// 3. Create model with caching -const model = wrapLanguageModel({ - model: anthropic("claude-sonnet-4-20250514"), - middleware: anthropicPromptCacheMiddleware, -}); +### Tool Composition -// 4. Create workflow tools -const todoState: TodoState = { todos: [] }; -const todoTool = createTodoWriteTool(todoState); -const taskTool = createTaskTool({ model, tools: sandboxTools }); - -// 5. Combine all tools -const tools = { - ...sandboxTools, - TodoWrite: todoTool, - Task: taskTool, -}; - -// 6. Run agent -const result = await generateText({ - model, - tools, - system: "You are a coding assistant. Use TodoWrite to plan tasks.", - prompt: "Build a REST API with Express", - stopWhen: stepCountIs(15), -}); - -// 7. Cleanup -await sandbox.destroy(); -``` - -### Tool Configuration - -Restrict tools with configuration: - -```typescript -const { tools } = createAgentTools(sandbox, { - tools: { - Bash: { - enabled: true, - blockedCommands: ["rm -rf", "sudo"], - maxOutputLength: 30000, - }, - Write: { - enabled: true, - allowedPaths: ["/tmp/workspace"], - maxFileSize: 1_000_000, - }, - }, -}); -``` - -## Tool Result Caching - -Cache tool execution results to avoid redundant operations: +`createAgentTools(sandbox, config)` is the single entry point that wires tools + cache + budget + context layers from a config object. Everything else is either internal or a lower-level primitive. -```typescript -import { createAgentTools, createLocalSandbox } from "bashkit"; - -const sandbox = createLocalSandbox({ cwd: "/tmp/workspace" }); +--- -// Enable caching with defaults (LRU, 5min TTL) -const { tools } = createAgentTools(sandbox, { cache: true }); +## Component Interactions -// Or customize caching behavior -const { tools } = createAgentTools(sandbox, { - cache: { - ttl: 10 * 60 * 1000, // 10 minutes - debug: true, // Log cache hits/misses - Read: true, // Enable for Read - Glob: true, // Enable for Glob - Grep: false, // Disable for Grep - }, -}); ``` - -**Default cached tools:** Read, Glob, Grep, WebFetch, WebSearch - -**Not cached by default:** Bash, Write, Edit (have side effects) - -### Cache Callbacks - -Track cache performance with callbacks: - -```typescript -const { tools } = createAgentTools(sandbox, { - cache: { - onHit: (toolName, key) => { - metrics.increment(`cache.hit.${toolName}`); - }, - onMiss: (toolName, key) => { - metrics.increment(`cache.miss.${toolName}`); - }, - }, -}); +User code → Vercel AI SDK → Tool (wrapped w/ context layers + cache) + ↓ + Sandbox interface + ↓ + ┌────────────────┼────────────────┐ + ↓ ↓ ↓ + LocalSandbox VercelSandbox E2BSandbox + ↓ ↓ ↓ + Bun APIs Firecracker VM E2B service ``` -### Cache Stats - -Cached tools have additional methods: - -```typescript -import type { CachedTool } from "bashkit"; - -const readTool = tools.Read as CachedTool; +--- -// Check cache performance (async for Redis compatibility) -console.log(await readTool.getStats()); -// { hits: 5, misses: 2, hitRate: 0.71, size: 2 } +## Dependencies -// Clear cache -await readTool.clearCache(); // Clear all -await readTool.clearCache("key"); // Clear specific entry -``` +**Required peer deps**: `ai` ^5.0.0, `zod` ^4.1.8. -### Redis Cache Store +**Optional peer deps** — users pick their execution environment: +- `@vercel/sandbox` ^1.0.0 — Vercel Firecracker isolation +- `@e2b/code-interpreter` ^1.0.0 — E2B hosted execution +- `parallel-web` ^1.0.0 — WebSearch / WebFetch backend -Use your existing Redis client with the helper: +All deps are marked **external** at build time so consumers don't get a duplicated `ai`/`zod` bundle. -```typescript -import { createRedisCacheStore, createAgentTools } from "bashkit"; +--- -const store = createRedisCacheStore(myRedisClient); -const { tools } = createAgentTools(sandbox, { cache: store }); -``` +## Breaking Change Surface -Works with `redis`, `ioredis`, or any client with `get`, `set`, `del`, `keys` methods. TTL is handled by the wrapper for consistent behavior across all cache backends. +Anything in this list requires a **major version bump**: -### Custom Cache Store +1. **`Sandbox` interface** (`src/sandbox/interface.ts`) — adding methods breaks every implementer. +2. **Tool input schemas** — AI models see these in prompts; removing or renaming fields breaks live integrations. +3. **Tool output/error shapes** — consumers pattern-match on them. +4. **Tool names** — they appear verbatim in prompts ("use the Bash tool"). +5. **`ContextLayer` signature** (`src/context/index.ts`) — changes ripple through every custom layer downstream. +6. **`SystemContext` shape** (`src/context/build-context.ts`) — consumers read individual sections. +7. **`createAgentTools` return shape** — `AgentToolsResult` is a public contract. -For other backends, implement the `CacheStore` interface: +Safe in minor/patch: +- Adding new optional config fields +- Adding new tools or sandbox implementations +- Internal refactors that preserve public API +- Bug fixes -```typescript -import type { CacheStore } from "bashkit"; +--- -const myStore: CacheStore = { - get(key) { /* return CacheEntry or undefined */ }, - set(key, entry) { /* store entry */ }, - delete(key) { /* remove entry */ }, - clear() { /* remove all entries */ }, - size() { /* optional: return count */ }, -}; +## Security Reminders -const { tools } = createAgentTools(sandbox, { cache: myStore }); -``` +The `Bash` tool executes arbitrary commands inside the sandbox — that's the whole point, but it means production deployments **must**: -### Standalone Caching +- Run inside a real sandbox (Vercel or E2B), not LocalSandbox. +- Set `blockedCommands` + `timeout` on `Bash`. +- Set `allowedPaths` on `Read` / `Write` / `Edit`. +- Set `maxFileSize` on `Write`. +- Never expose the raw agent loop to untrusted users without an additional auth layer. -Wrap individual tools with caching: +See `src/tools/AGENTS.md` for per-tool config details. -```typescript -import { cached, LRUCacheStore } from "bashkit"; +--- -const cachedTool = cached(myTool, "MyTool", { - ttl: 60000, // 1 minute - debug: true, // Log cache activity - store: new LRUCacheStore(500), // Max 500 entries -}); -``` +## Common Implementation Tasks + +| Task | Where to start | +|---|---| +| Add a new tool | `src/tools/AGENTS.md` → "Common Modifications" | +| Add a new sandbox | `src/sandbox/AGENTS.md` → "Common Modifications" | +| Add middleware | `src/middleware/AGENTS.md` → "Common Modifications" | +| Add a cache backend | `src/cache/AGENTS.md` → "Common Modifications" | +| Add a context layer or prompt section | `src/context/AGENTS.md` → "Common Modifications" | +| Add a skill source | `src/skills/AGENTS.md` → "Common Modifications" | +| Add a config field | Define in `src/types.ts`, consume in the relevant factory via `config?.yourField ?? default` | diff --git a/CLAUDE.md b/CLAUDE.md deleted file mode 100644 index a0c169a..0000000 --- a/CLAUDE.md +++ /dev/null @@ -1,649 +0,0 @@ -# BashKit - Claude Code Guide - -> Agentic coding tools for Vercel AI SDK - -**Tech Stack**: TypeScript • Bun • Vercel AI SDK • Zod -**Inspired by**: Claude Code tools -**Version**: 0.4.0 - ---- - -## Project Overview - -### What BashKit Solves - -BashKit provides a comprehensive toolkit for building AI coding agents using the Vercel AI SDK. It bridges the gap between AI models like Claude and actual code execution environments, enabling agents to: - -- Execute bash commands -- Read, write, and edit files -- Search codebases with glob/grep -- Fetch web content and perform searches -- Spawn sub-agents for complex tasks -- Manage state with todo lists - -### Key Features - -**10 Tools Available**: -- **Bash** - Execute shell commands with timeout control -- **Read** - Read files and list directories -- **Write** - Create or overwrite files -- **Edit** - Replace strings in existing files -- **Glob** - Find files by pattern matching -- **Grep** - Search file contents with regex -- **WebSearch** - Web search with domain filtering -- **WebFetch** - Fetch and analyze web URLs -- **Task** - Spawn sub-agents for complex work -- **TodoWrite** - Manage structured task lists - -### Architecture Philosophy - -1. **Bring Your Own Sandbox** - Start with LocalSandbox, swap to Vercel/E2B for production -2. **Type-Safe** - Full TypeScript with proper inference -3. **Configurable** - Security controls and limits at the tool level -4. **Composable** - Tools work together seamlessly -5. **Claude Code Compatible** - Tool signatures match Claude Code patterns - -### Use Cases - -- AI coding assistants and agents -- Automated development workflows -- Interactive code exploration tools -- Educational coding environments -- CI/CD automation with AI - ---- - -## Architecture & Patterns - -### Code Organization - -``` -src/ -├── sandbox/ # Execution environment abstractions (see src/sandbox/AGENTS.md) -├── tools/ # Tool implementations (see src/tools/AGENTS.md) -├── cache/ # Tool result caching (see src/cache/AGENTS.md) -├── middleware/ # Vercel AI SDK middleware (see src/middleware/AGENTS.md) -├── utils/ # Utility functions (see src/utils/AGENTS.md) -├── skills/ # Agent Skills standard (see src/skills/AGENTS.md) -├── setup/ # Agent environment setup (see src/setup/AGENTS.md) -├── cli/ # CLI initialization (see src/cli/AGENTS.md) -├── types.ts # Configuration types -└── index.ts # Main exports (barrel file) -``` - -Each folder has its own `AGENTS.md` with detailed file descriptions, key exports, architecture, and modification guides. - -### Key Design Patterns - -#### 1. Factory Pattern -All tools and sandboxes created via factory functions: -```typescript -const sandbox = createLocalSandbox({ workingDirectory: '/tmp' }); -const { tools } = await createAgentTools(sandbox, config); -``` - -#### 2. Sandbox Abstraction -Tools depend on the `Sandbox` interface, not specific implementations: -```typescript -interface Sandbox { - exec(command: string, options?: ExecOptions): Promise; - readFile(path: string): Promise; - writeFile(path: string, content: string): Promise; - readDir(path: string): Promise; - fileExists(path: string): Promise; - isDirectory(path: string): Promise; - destroy(): Promise; - readonly id?: string; // Sandbox ID for reconnection (cloud only) - rgPath?: string; // Path to ripgrep (set by ensureSandboxTools) -} -``` - -**Note**: `createVercelSandbox()` and `createE2BSandbox()` are async and auto-setup ripgrep: -```typescript -const sandbox = await createE2BSandbox({ apiKey: '...' }); -// rgPath is already set, Grep tool works immediately -``` - -#### 3. Tool Composition -Tools assembled into a ToolSet for Vercel AI SDK: -```typescript -const { tools } = await createAgentTools(sandbox, { - tools: { Bash: { timeout: 30000 } }, - webSearch: { apiKey: process.env.PARALLEL_API_KEY } -}); -// Returns: { Bash, Read, Write, Edit, Glob, Grep, WebSearch } -``` - -#### 4. Middleware System -Language models wrapped for cross-cutting concerns: -```typescript -const model = wrapLanguageModel({ - model: anthropic('claude-sonnet-4-5'), - middleware: anthropicPromptCacheMiddleware -}); -``` - -#### 5. Configuration as Code -Zod schemas define and validate all tool inputs: -```typescript -const bashInputSchema = z.object({ - command: z.string(), - description: z.string().nullable(), - timeout: z.number().nullable() -}); -``` - -#### 6. Nullable Types for AI Provider Compatibility - -All optional tool parameters use `.nullable()` instead of `.optional()` for OpenAI structured outputs compatibility. - -**Why `.nullable()` instead of `.optional()`:** -- OpenAI structured outputs require all properties in the `required` array -- `.optional()` removes properties from `required` (breaks OpenAI) -- `.nullable()` keeps properties in `required` but allows `null` values -- Works with both OpenAI and Anthropic models - -**Pattern for handling nullable values:** -```typescript -// Zod schema uses .nullable() -const schema = z.object({ - timeout: z.number().nullable(), - replace_all: z.boolean().nullable(), -}); - -// In execute function, use ?? for defaults -// NOTE: Destructuring defaults (= value) only work with undefined, NOT null -const { timeout, replace_all: rawReplaceAll } = input; -const effectiveTimeout = timeout ?? 120000; -const replaceAll = rawReplaceAll ?? false; -``` - -**Type conventions:** -- Zod schemas: `.nullable()` → produces `T | null` -- Exported interfaces: `T | null` (e.g., `description: string | null`) -- Internal functions: `T | null` for parameters that accept nullable values - -#### 7. Tool Result Caching -Optional caching for tool execution results: -```typescript -// Enable with defaults (LRU, 5min TTL) -const { tools } = await createAgentTools(sandbox, { cache: true }); - -// Per-tool control -const { tools } = await createAgentTools(sandbox, { - cache: { Read: true, Glob: true, Grep: false } -}); - -// Standalone wrapper -import { cached } from 'bashkit'; -const cachedTool = cached(myTool, 'MyTool', { ttl: 60000 }); -``` - -**Default cached tools**: Read, Glob, Grep, WebFetch, WebSearch -**Not cached by default**: Bash, Write, Edit (side effects) - -#### 8. Model Registry -Fetch model info (pricing + context lengths) from a provider: -```typescript -// Standalone model info (no budget needed) -const { tools, openRouterModels } = await createAgentTools(sandbox, { - modelRegistry: { provider: "openRouter" }, -}); -// openRouterModels: Map with pricing + contextLength - -// With budget tracking (recommended) -const { tools, budget, openRouterModels } = await createAgentTools(sandbox, { - modelRegistry: { provider: "openRouter" }, - budget: { maxUsd: 5.00 }, -}); -``` - -The `modelRegistry` config fetches model data once and shares it with budget tracking, compaction, and any other consumer. When both `modelRegistry` and `budget` are set, only one fetch occurs. - -**Legacy support**: `budget.pricingProvider` still works but is deprecated in favor of `modelRegistry`: -```typescript -// Still works (deprecated) -budget: { maxUsd: 5.00, pricingProvider: "openRouter" } -// Preferred -modelRegistry: { provider: "openRouter" }, budget: { maxUsd: 5.00 } -``` - -#### 9. Budget Tracking -Cumulative cost tracking across agentic loop steps: -```typescript -// Via createAgentTools (recommended) -const { tools, budget } = await createAgentTools(sandbox, { - modelRegistry: { provider: "openRouter" }, - budget: { maxUsd: 5.00 }, -}); - -// Standalone usage -const models = await fetchOpenRouterModels(); -const pricing = new Map([...models].map(([k, v]) => [k, v.pricing])); -const budget = createBudgetTracker(5.00, { openRouterPricing: pricing }); -``` - -**Pricing sources** (checked in order): -1. User-provided `modelPricing` overrides (highest priority) -2. OpenRouter's free public API (auto-fetched via `modelRegistry`, cached 24h) - -**Model ID matching** (PostHog's 3-tier strategy): -1. Exact match (case-insensitive) -2. Longest contained match (model variant contains cost variant) -3. Reverse containment (cost variant contains model variant) - -**Task tool integration**: When `budget` is set in `AgentConfig`, the budget tracker auto-wires into all Task tool sub-agents via `stopWhen` and `onStepFinish`. - -### Component Interactions - -``` -User → Vercel AI SDK → Tool (Bash/Read/Write/etc.) - ↓ - Sandbox Interface - ↓ - ┌─────────────┼─────────────┐ - ↓ ↓ ↓ - LocalSandbox VercelSandbox E2BSandbox - ↓ ↓ ↓ - Bun API Firecracker VM E2B Service -``` - ---- - -## File Map (Quick Reference) - -Each `src/` subfolder has an `AGENTS.md` with detailed file listings and guides. Key entry points: - -- **Configuration**: `/src/types.ts` (ToolConfig, AgentConfig, BudgetConfig, ModelRegistryConfig, DEFAULT_CONFIG) -- **Main exports**: `/src/index.ts` (barrel file) -- **Package config**: `/package.json` -- **Examples**: `/examples/basic.ts`, `/examples/test-tools.ts`, `/examples/test-web-tools.ts` - ---- - -## Development Workflow - -### Build Commands - -```bash -# IMPORTANT: Always run typecheck BEFORE build when making changes -bun run typecheck - -# Build everything (JS bundle + TypeScript declarations) -bun run build - -# Install dependencies -bun install -``` - -**Workflow**: Always run `bun run typecheck` first to catch type errors before building. The build command does not fail on type errors during the JS bundling step. - -**Build Process**: -1. Bun bundles TypeScript to ESM JavaScript (`dist/index.js`) -2. TypeScript compiler generates `.d.ts` declarations -3. All dependencies marked as external (no bundling of `ai`, `zod`, etc.) - -### Testing Changes - -**Unit tests** use Vitest (run via `bun run test`, NOT `bun test`): - -```bash -# Run all tests -bun run test - -# Run specific test file(s) -bun run test tests/utils/budget-tracking.test.ts - -# Watch mode -bun run test:watch -``` - -**Examples** serve as integration tests: - -```bash -# Test tools directly (no AI, no API key needed) -bun examples/test-tools.ts - -# Test web tools (requires PARALLEL_API_KEY) -PARALLEL_API_KEY=xxx bun examples/test-web-tools.ts - -# Full agentic loop (requires ANTHROPIC_API_KEY) -ANTHROPIC_API_KEY=xxx bun examples/basic.ts -``` - -### Local Development - -```typescript -// Use LocalSandbox for fast iteration -import { createLocalSandbox, createAgentTools } from './src'; - -const sandbox = createLocalSandbox({ workingDirectory: '/tmp' }); -const { tools } = await createAgentTools(sandbox); - -// Test your changes... -await tools.Bash.execute({ - command: 'echo "Hello"', - description: 'Test command' -}, { toolCallId: 'test', messages: [] }); -``` - -**Pro Tips**: -- LocalSandbox uses Bun APIs (fast, no network overhead) -- Use VercelSandbox or E2BSandbox for testing production behavior -- Check `examples/test-tools.ts` for tool API patterns - ---- - -## Common Implementation Tasks - -Each task has a detailed step-by-step guide in the relevant folder's `AGENTS.md`: - -| Task | Guide Location | -|------|---------------| -| Adding a new tool | `src/tools/AGENTS.md` → "Common Modifications" | -| Implementing a new sandbox | `src/sandbox/AGENTS.md` → "Common Modifications" | -| Adding middleware | `src/middleware/AGENTS.md` → "Common Modifications" | -| Adding a cache backend | `src/cache/AGENTS.md` → "Common Modifications" | -| Adding configuration options | Add types to `/src/types.ts`, use in tool factory via `config?.yourOption ?? default` | -| Adding a skill source | `src/skills/AGENTS.md` → "Common Modifications" | -| Setting up agent environments | `src/setup/AGENTS.md` → "Common Modifications" | - ---- - -## Code Conventions - -### Naming Conventions - -| Element | Convention | Examples | -|---------|------------|----------| -| Tool names | PascalCase | `Bash`, `Read`, `Write`, `WebSearch` | -| Factory functions | `createX` prefix | `createBashTool`, `createLocalSandbox` | -| Output types | `XOutput` suffix | `BashOutput`, `ReadOutput` | -| Error types | `XError` suffix | `BashError`, `ReadError` | -| Config types | `XConfig` suffix | `ToolConfig`, `AgentConfig` | -| Files | kebab-case | `bash.ts`, `anthropic-cache.ts` | - -### Type Organization - -**Input Schemas** - Colocated with tool implementation: -```typescript -// In /src/tools/bash.ts -const bashInputSchema = z.object({ - command: z.string(), - description: z.string() -}); -``` - -**Output Types** - Exported from tool files: -```typescript -export interface BashOutput { - stdout: string; - stderr: string; - exit_code: number; -} - -export interface BashError { - error: string; -} -``` - -**Union Types** - Tools return `Output | Error`: -```typescript -execute: async (input): Promise => { - // Implementation -} -``` - -**Config Types** - Centralized in `/src/types.ts`: -```typescript -export type ToolConfig = { /* ... */ }; -export type AgentConfig = { /* ... */ }; -``` - -### Error Handling - -**Pattern**: Return error objects, don't throw - -```typescript -// ✅ Correct -try { - const result = await sandbox.exec(command); - return { stdout: result.stdout }; -} catch (err) { - return { error: String(err) }; -} - -// ❌ Incorrect -try { - const result = await sandbox.exec(command); - return { stdout: result.stdout }; -} catch (err) { - throw err; // Don't throw from tools -} -``` - -**Exception**: Sandbox methods can throw (tools catch them) - -### Configuration Pattern - -**Accept optional config, merge with defaults**: - -```typescript -export function createBashTool(sandbox: Sandbox, config?: ToolConfig) { - const timeout = config?.timeout ?? 120000; - const maxOutput = config?.maxOutputLength ?? 30000; - - return tool({ - execute: async (input) => { - // Use timeout, maxOutput - } - }); -} -``` - -**Optional features enabled by config presence**: - -```typescript -// WebSearch only added if config provided -if (config?.webSearch) { - tools.WebSearch = createWebSearchTool(config.webSearch); -} -``` - ---- - -## Important Notes & Gotchas - -### Dependencies - -**Peer Dependencies** (required): -- `ai` ^5.0.0 - Vercel AI SDK -- `zod` ^4.1.8 - Schema validation - -**Optional Peer Dependencies**: -- `@vercel/sandbox` ^1.0.0 - Vercel execution environment -- `@e2b/code-interpreter` ^1.0.0 - E2B code execution -- `parallel-web` ^1.0.0 - Web search/fetch operations - -**Why optional?** Users choose their execution environment: -- LocalSandbox (no deps) for development -- VercelSandbox (requires `@vercel/sandbox`) for production -- E2BSandbox (requires `@e2b/code-interpreter`) for hosted execution - -**Build externals**: All dependencies marked as external to prevent bundling duplication. - -### Testing Strategy - -**Unit tests** via Vitest (`bun run test`): -- `/tests/tools/` - Tool unit and integration tests -- `/tests/utils/` - Utility function tests - -**Examples** as integration tests: -- `/examples/test-tools.ts` - Direct tool API testing (no AI model needed) -- `/examples/basic.ts` - Full agentic loop with Claude -- `/examples/test-web-tools.ts` - Web tools demonstration - -**Before releases**: -1. Run `bun run test` to verify all unit tests pass -2. Run all examples to verify functionality -3. Test each sandbox implementation -4. Verify type generation (`bun run build`) - -### Breaking Changes to Avoid - -**Public APIs** (require major version bump): - -1. **Sandbox interface** (`/src/sandbox/interface.ts`) - - Adding methods breaks implementers - - Changing method signatures breaks all sandboxes - -2. **Tool input schemas** - - AI models rely on these - - Removing fields breaks existing prompts - -3. **Tool output types** - - Consumers depend on these shapes - - Removing fields breaks user code - -4. **Tool names** - - Used in AI prompts (e.g., "use the Bash tool") - - Renaming breaks prompt compatibility - -**Safe changes** (minor/patch versions): -- Adding new optional config fields -- Adding new tools -- Adding new sandbox implementations -- Internal refactoring -- Bug fixes - -### Performance Considerations - -**Tool Result Caching**: -```typescript -// Enable caching for read-only tools -const { tools } = await createAgentTools(sandbox, { cache: true }); - -// Custom TTL and per-tool control -const { tools } = await createAgentTools(sandbox, { - cache: { - ttl: 10 * 60 * 1000, // 10 minutes - debug: true, // Log cache hits/misses - Read: true, - Glob: true, - WebFetch: false, // Disable for this tool - } -}); - -// Check cache stats -const readTool = tools.Read as CachedTool; -console.log(readTool.getStats()); -// { hits: 5, misses: 2, hitRate: 0.71, size: 2 } -``` -Returns cached results for identical tool calls. Default TTL: 5 minutes. - -**Budget Tracking**: -```typescript -// Track cumulative cost and stop when budget exceeded -const { tools, budget } = await createAgentTools(sandbox, { - modelRegistry: { provider: "openRouter" }, - budget: { maxUsd: 5.00 }, -}); - -const result = await generateText({ - model, - tools, - stopWhen: [stepCountIs(50), budget.stopWhen], - onStepFinish: (step) => { - budget.onStepFinish(step); - console.log(budget.getStatus()); - // { totalCostUsd: 0.12, maxUsd: 5, remainingUsd: 4.88, ... } - }, -}); -``` -Pricing auto-fetched from OpenRouter via `modelRegistry` (free API, cached 24h). Supports manual `modelPricing` overrides. Budget auto-wires into Task tool sub-agents. - -**Prompt Caching**: -```typescript -import { anthropicPromptCacheMiddleware } from 'bashkit'; - -const model = wrapLanguageModel({ - model: anthropic('claude-sonnet-4-5'), - middleware: anthropicPromptCacheMiddleware -}); -``` -Reduces cost/latency for repeated prompts (3+ messages). - -**Message Pruning**: -```typescript -import { pruneMessages } from 'bashkit'; - -const pruned = pruneMessages(messages, { - maxTokens: 100000, - protectRecentUserMessages: 3 -}); -``` -Keeps conversations within token limits. - -**Sandbox Choice**: -- **LocalSandbox**: Fastest (Bun APIs), use for development -- **VercelSandbox**: Production-ready, Firecracker isolation -- **E2BSandbox**: Hosted, good for serverless environments - -**Timeout Configuration**: -```typescript -const { tools } = await createAgentTools(sandbox, { - defaultTimeout: 30000, // 30 seconds instead of 120s - tools: { - Bash: { timeout: 10000 } // Override per-tool - } -}); -``` - -### Security Notes - -**Bash Tool Risks**: -- Executes arbitrary commands -- Can access filesystem, network, system -- Use `blockedCommands` to restrict dangerous operations - -**Configuration-Based Security**: - -```typescript -const { tools } = await createAgentTools(sandbox, { - tools: { - Bash: { - blockedCommands: ['rm -rf', 'dd if=', 'curl'], - timeout: 10000 - }, - Read: { - allowedPaths: ['/workspace/**'] // Restrict file access - }, - Write: { - maxFileSize: 1_000_000, // 1MB limit - allowedPaths: ['/workspace/**'] - } - } -}); -``` - -**Best Practices**: -- Always set timeouts to prevent hanging -- Use allowedPaths for file operations -- Block dangerous bash commands -- Set file size limits -- Run in sandboxed environments (Vercel/E2B) for production -- Don't expose directly to untrusted users without additional controls - ---- - -## Additional Resources - -- **GitHub**: https://github.com/jbreite/bashkit -- **npm**: `bashkit` (v0.1.0) -- **Examples**: See `/examples/` directory -- **Issues**: Report bugs on GitHub Issues - ---- - -*Last Updated*: 2026-01-22 -*For*: Claude Code and AI coding assistants diff --git a/CLAUDE.md b/CLAUDE.md new file mode 120000 index 0000000..47dc3e3 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file diff --git a/README.md b/README.md index 190af88..e5a8853 100644 --- a/README.md +++ b/README.md @@ -549,6 +549,96 @@ console.log({ }); ``` +## Context Layer + +bashkit ships a context layer that handles two concerns most agent loops end up reinventing: + +1. **Static system prompt assembly** — discover project docs (`AGENTS.md` / `CLAUDE.md`), collect environment info (cwd, shell, platform, git branch), and build tool guidance. Runs once at init so the system prompt stays stable for Anthropic prompt caching. +2. **Dynamic per-step layers** — intercept every tool call with `beforeExecute` gates (plan mode, custom allow/deny) and `afterExecute` transforms (output truncation, redirection hints, optional disk stash). Compose into an AI SDK `prepareStep` with auto-compaction and context-status monitoring. + +### Building a System Prompt + +```typescript +import { buildSystemContext, createLocalSandbox } from 'bashkit'; + +const sandbox = createLocalSandbox({ cwd: process.cwd() }); + +const context = await buildSystemContext(sandbox, { + instructions: true, // walk up from cwd, load AGENTS.md / CLAUDE.md + environment: true, // inject XML + toolGuidance: { + tools: ['Bash', 'Read', 'Write', 'Edit', 'Grep', 'Glob'], + }, +}); + +// context.combined -> ready to drop into streamText({ system }) +// context.instructions / context.environment / context.toolGuidance -> individual sections +// context.meta.instructionSources -> which files were discovered +``` + +Call this **once at init**. The output must stay stable across turns for prompt caching to work — never regenerate it mid-conversation. + +### Tool Execution Layers + +```typescript +import { + applyContextLayers, + createExecutionPolicy, + createOutputPolicy, + createAgentTools, + createLocalSandbox, +} from 'bashkit'; + +const sandbox = createLocalSandbox({ cwd: '/tmp/workspace' }); +const { tools, planModeState } = createAgentTools(sandbox, { planMode: true }); + +const wrappedTools = applyContextLayers(tools, [ + // Gate: block Bash/Write/Edit while plan mode is active + createExecutionPolicy(planModeState), + + // Transform: truncate oversized results, inject redirection hints, + // optionally stash full output to disk + createOutputPolicy({ + maxOutputLength: 30_000, + redirectionThreshold: 20_000, + stashOutput: { + sandbox, + tools: ['Bash', 'Grep'], // only these get full output stashed + }, + }), +]); +``` + +Layers compose: `beforeExecute` runs in order (first rejection wins), `afterExecute` transforms pipe. Custom layers just implement the `ContextLayer` interface — see `src/context/AGENTS.md` for the full contract. + +### prepareStep Composition + +```typescript +import { createPrepareStep, MODEL_CONTEXT_LIMITS } from 'bashkit'; + +const prepareStep = createPrepareStep({ + compaction: { + maxTokens: MODEL_CONTEXT_LIMITS['claude-sonnet-4-5'], + summarizerModel: anthropic('claude-haiku-4'), + compactionThreshold: 0.85, + }, + contextStatus: { + maxTokens: MODEL_CONTEXT_LIMITS['claude-sonnet-4-5'], + }, + planModeState, // injects a plan-mode hint as a user message +}); + +await streamText({ + model, + tools: wrappedTools, + system: context.combined, // from buildSystemContext + messages, + prepareStep, +}); +``` + +**Important**: `createPrepareStep` never touches `system` — it only modifies `messages`. That's load-bearing for Anthropic prompt caching. If you extend it via the `extend` callback, do not set `system` either. + ## Agent Skills bashkit supports the [Agent Skills](https://agentskills.io) standard - an open format for giving agents new capabilities and expertise. Skills are folders containing a `SKILL.md` file with instructions that agents can load on-demand. @@ -939,6 +1029,17 @@ Creates a set of agent tools bound to a sandbox instance. - `anthropicPromptCacheMiddleware` - Enable prompt caching for Anthropic models (AI SDK v6+) - `anthropicPromptCacheMiddlewareV2` - Enable prompt caching for Anthropic models (AI SDK v5) +### Context Layer + +- `buildSystemContext(sandbox, config?)` - Assemble instructions + environment + tool guidance into a system prompt +- `discoverInstructions(sandbox, config?)` - Walk up from cwd loading AGENTS.md / CLAUDE.md files +- `collectEnvironment(sandbox, config?)` / `formatEnvironment(env)` - Capture and format cwd/shell/platform/git state +- `buildToolGuidance(config)` - Generate one-line hints for registered tools +- `withContext(tool, name, layers)` / `applyContextLayers(tools, layers)` - Wrap tools with gate + transform layers +- `createExecutionPolicy(planModeState, config?)` - Plan-mode + custom gate `ContextLayer` +- `createOutputPolicy(config?)` - Truncation + redirection hints + optional disk stash `ContextLayer` +- `createPrepareStep(config)` - Compose compaction + context-status + plan-mode hints into an AI SDK `PrepareStepFunction` + ## Future Roadmap The following features are planned for future releases: diff --git a/docs/src/app/MobileNav.tsx b/docs/src/app/MobileNav.tsx index db374e9..9601b18 100644 --- a/docs/src/app/MobileNav.tsx +++ b/docs/src/app/MobileNav.tsx @@ -11,6 +11,7 @@ const links = [ { href: "/getting-started", label: "Getting Started" }, { href: "/tools", label: "Tools" }, { href: "/sandboxes", label: "Sandboxes" }, + { href: "/context", label: "Context" }, { href: "/api-reference", label: "API Reference" }, ]; diff --git a/docs/src/app/SideNav.tsx b/docs/src/app/SideNav.tsx index 4569c5b..28a7807 100644 --- a/docs/src/app/SideNav.tsx +++ b/docs/src/app/SideNav.tsx @@ -63,6 +63,19 @@ const links: { { id: "e2b-sandbox", text: "E2BSandbox" }, ], }, + { + href: "/context", + label: "Context", + items: [ + { id: "overview", text: "Overview" }, + { id: "context-layers", text: "Context Layers" }, + { id: "execution-policy", text: "Execution Policy" }, + { id: "output-policy", text: "Output Policy" }, + { id: "system-prompt", text: "System Prompt" }, + { id: "prepare-step", text: "prepareStep" }, + { id: "full-example", text: "Full Example" }, + ], + }, { href: "/api-reference", label: "API Reference", diff --git a/docs/src/app/api-reference/page.tsx b/docs/src/app/api-reference/page.tsx index 968e316..40f0ee5 100644 --- a/docs/src/app/api-reference/page.tsx +++ b/docs/src/app/api-reference/page.tsx @@ -74,6 +74,11 @@ const { tools, budget } = await createAgentTools(sandbox, { openRouterModels — Model registry map (when modelRegistry config provided) +
  • + contextLayers — Applied context layers (empty + array when no context config). Use with{" "} + applyContextLayers() for late-added tools. +
  • @@ -104,6 +109,11 @@ const { tools, budget } = await createAgentTools(sandbox, { Budget tracking configuration. Requires modelRegistry or pricingProvider. + + Context layer config. Opt-in — wraps tools with execution + and output policies. See the{" "} + Context page. +

    ToolConfig (per-tool)

    diff --git a/docs/src/app/components/NavIcons.tsx b/docs/src/app/components/NavIcons.tsx index 4e443b6..27282cf 100644 --- a/docs/src/app/components/NavIcons.tsx +++ b/docs/src/app/components/NavIcons.tsx @@ -38,6 +38,13 @@ export const navIcons: Record = { ), + "/context": ( + + + + + + ), "/api-reference": ( diff --git a/docs/src/app/context/page.tsx b/docs/src/app/context/page.tsx new file mode 100644 index 0000000..fe0bb60 --- /dev/null +++ b/docs/src/app/context/page.tsx @@ -0,0 +1,333 @@ +"use client"; + +import { CodeBlock } from "../components/CodeBlock"; +import { Footer } from "../Footer"; + +export default function Context() { + return ( + <> +
    +
    +

    Context

    +

    + System prompt assembly, tool execution gating, and output policies. +

    + + +
    +

    Overview

    +

    + The context layer is an opt-in system that wraps your tools with + cross-cutting behavior: blocking tools based on state (execution + policy), truncating large outputs with redirection hints (output + policy), and assembling a static system prompt from project docs and + environment info. +

    +

    + Enable it by passing a context config to{" "} + createAgentTools: +

    + +

    + When context is omitted, tools work exactly as before + — no wrapping, no overhead. +

    +
    + +
    +

    Context Layers

    +

    + A ContextLayer intercepts tool execution with two + optional hooks: +

    +
      +
    • + beforeExecute — return{" "} + {`{ error: string }`} to block a tool call, or{" "} + undefined to allow it +
    • +
    • + afterExecute — transform the tool result (e.g., + truncate output) +
    • +
    + { + console.log(\`Calling \${toolName}\`); + return undefined; // allow execution + }, + afterExecute: (toolName, params, result) => { + console.log(\`\${toolName} returned\`); + return result; // pass through unchanged + }, +}; + +// Wrap a single tool +const wrappedTool = withContext(myTool, 'MyTool', [loggingLayer]); + +// Wrap an entire ToolSet +const wrappedTools = applyContextLayers(tools, [loggingLayer]);`} + /> +

    + Layers compose in order: first beforeExecute rejection + wins, afterExecute transforms pipe through + sequentially. +

    +
    + +
    +

    Execution Policy

    +

    + Gates tool execution based on state. The most common use case is + plan mode — blocking write tools while allowing read-only + tools. +

    + { + if (toolName === 'Bash' && String(params.command).includes('rm')) { + return 'Destructive commands are not allowed'; + } + return undefined; + }, +});`} + /> +

    + Tools stay registered in the tool set (prompt cache stable) — + only execution is gated. +

    +
    + +
    +

    Output Policy

    +

    + Handles large tool outputs by truncating and injecting hints that + tell the model how to access the full result. +

    + +

    + When output exceeds redirectionThreshold, it gets + truncated to maxOutputLength and a{" "} + _hint field is added with tool-specific guidance (e.g., + "use head/tail to see specific + parts"). +

    + +

    Custom Hints

    + { + if (toolName === 'Bash' && params.command === 'git log') { + return 'Use git log with --oneline or -n to limit output.'; + } + return undefined; // fall through to hints map / defaults + }, +});`} + /> + +

    Stash to Disk

    +

    + Optionally save full output to disk before truncating, so the model + can Read the file later: +

    + +
    + +
    +

    System Prompt Assembly

    +

    + buildSystemContext assembles a static system prompt from + three sources: discovered project instructions (AGENTS.md / + CLAUDE.md files), environment info (cwd, platform, git branch), and + tool guidance. +

    + +

    + Call once at init — the output is deterministic and designed to + stay stable across turns for Anthropic prompt caching. +

    +
    + +
    +

    prepareStep

    +

    + createPrepareStep returns a callback for the AI + SDK's prepareStep option. It composes message + compaction, context status monitoring, and plan mode hints. +

    + { + // Custom logic runs after built-in steps + return {}; + }, +}); + +const result = await streamText({ + model, + tools, + messages, + prepareStep, +});`} + /> +

    + Important: prepareStep never touches + the system prompt — all dynamic content is + injected as user messages to preserve prompt caching. +

    +
    + +
    +

    Full Example

    + +
    + + +