AgentBasalt

Test your AI agent code — not the AI itself.

What does this test?

You wrote an AI agent. It calls OpenAI/Anthropic, gets a response, then does something with it — sends an email, updates a database, calls an API, formats output.

AgentBasalt tests YOUR code, not the AI. The AI is an external dependency. You mock it, so you can test your logic deterministically.

┌──────────────────────────────────────────────────┐
│                 Your Agent Code                   │
│                                                  │
│   call LLM → get response → process → take action │
│                                                  │
│   AgentBasalt tests this part: ^^^^^^^^^^^^^^^^  │
│                                                  │
│   - Did it call the right tool?                  │
│   - Did it handle the "spam" label correctly?    │
│   - Did it crash on empty response?              │
│   - Did it notify the user?                      │
└──────────────────────────────────────────────────┘

Why do I need this?

Without AgentBasalt:

// ❌ This test is flaky — AI might return "spam" or "not spam"
test("blocks spam", async () => {
  const result = await processEmail("You won a prize!");
  expect(result.blocked).toBe(true); // might fail randomly
});

With AgentBasalt:

// ✅ This test is deterministic — AI always returns "spam"
test("blocks spam", async () => {
  engine.addMock({
    match: { contains: "prize" },
    response: { content: "spam" },
  });
  const result = await processEmail("You won a prize!");
  expect(result.blocked).toBe(true); // always passes
});

Without AgentBasalt	With AgentBasalt
Tests are flaky (AI returns different things)	Tests are deterministic
Every test costs real API money	Mock mode is free, record once & replay forever
100 tests take 5 minutes	100 tests take 2 seconds
Can't tell if your code broke	Regression testing works

Quick Start

Install

npm install -D agentbasalt

Write a test

// tests/spam-filter.spec.ts
import { test, expect, agentBasalt } from "agentbasalt";

// Your actual agent code
async function processEmail(email: string, llmClient: any) {
  const response = await llmClient.chat.completions.create({
    model: "gpt-4o",
    messages: [
      {
        role: "user",
        content: `Is this spam? Reply only "spam" or "not spam": ${email}`,
      },
    ],
  });
  const label = response.choices[0].message.content.trim();

  if (label === "spam") {
    return { blocked: true, reason: "spam" };
  }
  return { blocked: false };
}

// Tests
test("blocks spam emails", async () => {
  const engine = agentBasalt({
    mode: "mock",
    mockResponses: [
      {
        match: { contains: "prize" },
        response: { content: "spam" },
      },
    ],
  });

  const mockClient = createMockOpenAIClient(engine);
  const result = await processEmail("You won a free prize!", mockClient);

  expect(result).toHaveProperty("blocked", true);
});

test("allows normal emails", async () => {
  const engine = agentBasalt({
    mode: "mock",
    mockResponses: [
      {
        match: { contains: "meeting" },
        response: { content: "not spam" },
      },
    ],
  });

  const mockClient = createMockOpenAIClient(engine);
  const result = await processEmail("Meeting at 3pm tomorrow", mockClient);

  expect(result).toHaveProperty("blocked", false);
});

test("handles empty AI response gracefully", async () => {
  const engine = agentBasalt({
    mode: "mock",
    mockResponses: [
      {
        match: { contains: "edge" },
        response: { content: "" }, // AI returns empty
      },
    ],
  });

  const mockClient = createMockOpenAIClient(engine);
  // Your code should not crash
  const result = await processEmail("edge case", mockClient);
  expect(result).toHaveProperty("blocked", false);
});

Run

npx agentbasalt test

Features

1. Mock Mode — Control what the AI returns

const engine = agentBasalt({
  mode: "mock",
  mockResponses: [
    { match: { contains: "weather" }, response: { content: "sunny, 25°C" } },
    {
      match: { pattern: "meeting.*tomorrow" },
      response: { content: "confirmed" },
    },
    { match: { exact: "ping" }, response: { content: "pong" } },
  ],
});

2. Record & Replay — Record once, replay forever

# Step 1: Record real API responses (costs money, do once)
npx agentbasalt test --record

# Step 2: Replay from saved files (free, instant)
npx agentbasalt test

# Step 3: Commit the cassettes to git
git add __cassettes__/

3. Tool Call Assertions — Verify your agent calls the right tools

expect(result).toHaveCalledTool("send_email", { to: "user@example.com" });
expect(result).toHaveNotCalledTool("delete_database");
expect(result).toHaveToolCallCount(2);

4. Response Assertions

expect(result).toHaveRespondedContaining("Tokyo");
expect(result).toHaveRespondedMatching(/temperature.*\d+/);
expect(result).toHaveRespondedWith("exact string");
expect(result).toHaveEmptyResponse();

5. Performance Assertions

expect(result).toHaveCompletedWithin(3000); // under 3 seconds
expect(result).toHaveUsedTokensLessThan(500); // under 500 tokens
expect(result).toHaveUsedModel("gpt-4o");
expect(result).toHaveFinishReason("stop");

6. Tool Mocking — Mock external services

import { mockTool } from "agentbasalt";

const weatherAPI = mockTool("get_weather", {
  handler: (params) => ({ temp: 25, city: params.city, condition: "sunny" }),
  requiredParams: ["city"],
  maxCalls: 3,
  delay: 100, // simulate latency
});

// After test, inspect what was called
console.log(weatherAPI.calls);
// [{ params: { city: 'Tokyo' }, result: {...}, duration: 102 }]

7. Dataset Testing — Test with many inputs

import { loadDataset, testScenario } from "agentbasalt";

const dataset = await loadDataset("./test-data/emails.json");

testScenario("email classification", dataset, async (input, expected) => {
  const result = await myAgent.run(input.text);
  expect(result).toHaveRespondedContaining(expected.label);
});

// test-data/emails.json
{
  "cases": [
    {
      "input": { "text": "Win a free iPhone!" },
      "expected": { "label": "spam" }
    },
    {
      "input": { "text": "Meeting at 3pm" },
      "expected": { "label": "not spam" }
    }
  ]
}

8. Cost Tracking — Know how much you're spending

import { costTracker, expect } from "agentbasalt";

test("agent is cost-efficient", async () => {
  const tracker = costTracker();
  await tracker.track(myAgent.run("summarize this"));

  expect(tracker).toHaveCostLessThan(0.05); // max $0.05
  expect(tracker).toHaveTokenCountLessThan(2000); // max 2000 tokens

  console.log(tracker.breakdown());
  // [{ model: 'gpt-4o', calls: 3, cost: 0.011 }]
});

9. Multi-step Agent Trace — Track complex agent flows

import { trace, expect } from "agentbasalt";

const t = trace();

await t.run(async () => {
  const plan = await t.step("plan", "llm", async () => {
    return llm.call("How to answer?");
  });
  const data = await t.step("search", "tool", async () => {
    return searchAPI(plan.query);
  });
  await t.step("answer", "llm", async () => {
    return llm.call(`Summarize: ${data}`);
  });
});

expect(t).toHaveStepCount(3);
expect(t).toHaveStepSequence(["plan", "search", "answer"]);
expect(t).toHaveLLMCallCount(2);
expect(t).toHaveToolCallCount(1);
expect(t).toHaveNoErrors();

Adapters

Provider	Import	Status
OpenAI	`agentbasalt/adapters/openai`	Stable
Anthropic	`agentbasalt/adapters/anthropic`	Stable
Vercel AI SDK	`agentbasalt/adapters/vercel-ai`	Stable
LangChain	`agentbasalt/adapters/langchain`	Stable

import { wrapOpenAI } from "agentbasalt/adapters/openai";
const handler = engine.createHandler();
const mockClient = wrapOpenAI(new OpenAI(), handler);

import { wrapAnthropic } from "agentbasalt/adapters/anthropic";
const handler = engine.createHandler();
const mockClient = wrapAnthropic(new Anthropic(), handler);

CI Integration

# .github/workflows/test.yml
name: Agent Tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: "20" }
      - run: npm ci
      - run: npx agentbasalt test

Configuration

// agentbasalt.config.ts
import { defineConfig } from "agentbasalt";

export default defineConfig({
  mode: "replay",
  cassetteDir: "__cassettes__",
  matchStrategy: "contains", // 'exact' | 'contains' | 'pattern' | 'fuzzy'
  sanitize: {
    maskApiKeys: true,
    maskEmails: true,
  },
});

# Or use environment variables
AGENTBASALT_MODE=record npx agentbasalt test

How It Works

Your Test
    │
    ▼
AgentBasalt Engine intercepts LLM SDK calls
    │
    ├─ Mock Mode ──────► Returns your predefined response
    ├─ Record Mode ────► Calls real API, saves response to file
    └─ Replay Mode ────► Loads saved response from file

AgentBasalt wraps the LLM SDK client (OpenAI, Anthropic, etc.) at the SDK level. When your code calls the LLM, AgentBasalt intercepts the call and returns a mock/recorded response instead.

Contributing

git clone https://github.com/YoungDan-hero/agentbasalt.git
cd agentbasalt
npm install
npm test

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
examples/openai-agent		examples/openai-agent
plugins/github-action		plugins/github-action
src		src
tests		tests
var/folders/j4/hy8v7qkx70913r94qd15kgx80000gp/T		var/folders/j4/hy8v7qkx70913r94qd15kgx80000gp/T
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentBasalt

What does this test?

Why do I need this?

Quick Start

Install

Write a test

Run

Features

1. Mock Mode — Control what the AI returns

2. Record & Replay — Record once, replay forever

3. Tool Call Assertions — Verify your agent calls the right tools

4. Response Assertions

5. Performance Assertions

6. Tool Mocking — Mock external services

7. Dataset Testing — Test with many inputs

8. Cost Tracking — Know how much you're spending

9. Multi-step Agent Trace — Track complex agent flows

Adapters

CI Integration

Configuration

How It Works

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AgentBasalt

What does this test?

Why do I need this?

Quick Start

Install

Write a test

Run

Features

1. Mock Mode — Control what the AI returns

2. Record & Replay — Record once, replay forever

3. Tool Call Assertions — Verify your agent calls the right tools

4. Response Assertions

5. Performance Assertions

6. Tool Mocking — Mock external services

7. Dataset Testing — Test with many inputs

8. Cost Tracking — Know how much you're spending

9. Multi-step Agent Trace — Track complex agent flows

Adapters

CI Integration

Configuration

How It Works

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages