Research Squad is a multi-agent research system built with Effect and BAML. Inspired by the architecture of Anthropic's Claude Research feature, it provides a robust framework for orchestrating multiple specialized AI agents to conduct comprehensive research by decomposing complex user queries into parallelizable sub-tasks.
This project aims to provide a reference for building scalable, type-safe, and observable agentic systems in TypeScript adopting functional paradigms - demonstrating good practices in service-oriented architecture, contract-driven TDD and structured concurrency.
- Multi-Agent Orchestration: A hierarchical agent system (General Assistant → Research Lead → Subagents) that plans, delegates, and synthesizes research with sophisticated task decomposition and execution.
- Type-Safe by Construction: Built entirely with Effect, ensuring all errors, dependencies, and data types are explicit and verified at compile-time. Utilizes
Effect<A, E, R>to make success types, error conditions, and service dependencies explicit in the type system. - Declarative & Type-Safe: A purely functional, composable approach to managing complexity, with all business logic implemented using
Effect.genfor intuitive, imperative-like syntax while maintaining declarative composition. - Structured Concurrency: Manages parallel sub-agent execution safely and efficiently with Effect's built-in, resource-safe concurrency primitives, including bounded parallelism, automatic resource cleanup, and backpressure.
- Schema-Driven Data Modeling: Uses
@effect/schemafor all data models, providing runtime validation, static type generation, serialization, and bidirectional transformation from a single source of truth. - Robust Dependency Injection: Leverages Effect's
Layersystem for composable, memoized service construction, enabling clean dependency management and unparalleled testability. - Comprehensive Observability: Integrated support for structured logging, distributed tracing (via
Effect.withSpanand OpenTelemetry), and metrics collection (Prometheus) for production monitoring. - AI Function Calling with BAML: Integrates with the BAML (Boundary ML) framework to define, version, test, and execute LLM function calls in a structured, declarative, and reliable manner.
- Comprehensive Validation Suite: Includes a dedicated CLI and service for validating research logs, HAR files, and tool call schemas against defined contracts.
- Production-Grade Error Handling: Distinguishes between recoverable failures (typed errors) and unrecoverable defects (bugs), with centralized, type-safe retry policies and exhaustive error handling.
- Runtime: Bun (>= 1.1.0)
- Core Framework: Effect 3.x
- Schema & Validation:
@effect/schema - CLI:
@effect/cli - AI Orchestration: BAML (Boundary ML)
- Testing: Vitest with
@effect/vitest - Linting & Formatting: Biome
- Observability: OpenTelemetry (Jaeger for Tracing, Prometheus for Metrics, Grafana for Dashboards) via Docker Compose
This project is built upon the core principles of the Effect ecosystem and follows Domain-Driven Design (DDD) and Hexagonal Architecture:
-
Effects as Blueprints: An
Effectis an immutable, lazy description of a program—a blueprint, not an execution. No side effects occur until the program is executed by aRuntimeat the application's edge. This enables powerful composition, testing, and reasoning about program behavior. -
Separation of "What" from "How": Business logic (what the program does) is defined as pure
Effectworkflows in services. Cross-cutting concerns like logging, retries, metrics, and dependency injection (how it runs) are composed declaratively using combinators and layers. -
Make Impossible States Unrepresentable: The type system (
Effect<A, E, R>,Schema,Data.TaggedError) is leveraged to enforce correctness at compile time, eliminating entire classes of runtime bugs. If it compiles, it's much more likely to be correct. -
Composition over Inheritance: Complex functionality is built by composing small, independent, highly cohesive services and effects, not through inheritance hierarchies.
-
Dependency Inversion: All business logic depends on abstract service interfaces, not concrete implementations. This is enforced through Effect's
Layersystem, making the codebase highly modular and testable.
The system is architected around a clear separation of concerns, following the principles of Domain-Driven Design and Dependency Inversion.
The research process is orchestrated through a chain of specialized agents, each with a distinct responsibility:
-
General Assistant: The entry point that receives the user's initial query and determines if it's a simple conversational turn or requires in-depth research.
-
Research Lead Agent: If research is needed, this agent takes over. It analyzes the query, develops a research strategy, classifies the query type (e.g., breadth-first, depth-first), and generates a set of parallelizable tasks for sub-agents.
-
Research Subagents: A team of parallel agents that execute the individual research tasks. They use tools like
web_searchandweb_fetchto gather information from external sources. -
Citations Agent: A final-pass agent responsible for adding accurate, properly formatted citations to the synthesized report.
The application is composed of several single-responsibility services that are wired together at the application's edge.
┌──────────────────────────────────────────────────────────────┐
│ Application Layer │
│ (MainLayer) │
└──────────────────────────────────────────────────────────────┘
│
│ provides
▼
┌──────────────────────────────────────────────────────────────┐
│ MultiAgentOrchestratorService │
│ • Coordinates the full research pipeline from query to report│
└──────────────────────────────────────────────────────────────┘
│ │ │
│ depends on │ depends on │ depends on
▼ ▼ ▼
┌─────────────────┐ ┌──────────────────┐ ┌───────────────────┐
│ BamlClientSvc │ │SessionManagerSvc │ │ ToolRouterSvc │
│ • LLM calls │ │ • Session state │ │ • Tool dispatch │
│ • BAML funcs │ │ • Lifecycle mgmt │ │ • Validation │
│ • Retries │ │ • Concurrent-safe│ │ • Schema checks │
└─────────────────┘ └──────────────────┘ └───────────────────┘
│
│ depends on
┌───────────────┴───────────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ WebSearchService │ │ WebFetchService │
│ • Brave Search │ │ • HTTP fetching │
│ • Result parsing │ │ • Content extract│
└──────────────────┘ └──────────────────┘
Key Services:
- MultiAgentOrchestratorService: The central coordinator of the research pipeline, managing the entire workflow from query analysis to report synthesis.
- BamlClientService: A type-safe wrapper around the BAML-generated LLM client, handling retries (
llmRetrypolicy), timeouts, and error mapping. - SessionManagerService: A concurrent-safe service for managing the state and lifecycle of each research session using Effect's
Refprimitive. - ToolRouterService: Validates and dispatches tool calls (e.g.,
web_search,web_fetch) to their respective implementations, ensuring schema compliance. - WebSearchService / WebFetchService: Infrastructure services that interact with external APIs (Brave Search, HTTP).
- MetricsService: Collects and exposes application metrics for observability.
All services are composed into a single MainLayer in src/layers/AppLayer.ts. This layer represents the complete dependency graph of the application. Following Effect's mandatory "single provide" rule, this MainLayer is provided once at the application's entry point (src/main.ts), ensuring all services are memoized and resources are managed within a single, unified scope.
- Bun >= 1.1.0 (
curl -fsSL https://bun.sh/install | bash) - Node.js >= 18.0.0 (for compatibility)
- Docker and Docker Compose (for running the observability stack)
- API keys for Anthropic and Brave Search
-
Clone the repository:
git clone <repository-url> cd research-squad
-
Install dependencies:
bun install
-
Configure environment variables: Copy the example environment file and fill in your API keys.
cp .env.example .env
Edit
.env:ANTHROPIC_API_KEY="your_anthropic_key_here" BRAVE_API_KEY="your_brave_search_key_here"
-
Generate the BAML client: This command reads your
.bamlfiles and generates a type-safe TypeScript client.bun baml:generate
-
Run verification script: Before committing, always run the full verification script. This checks types, lints, formats, runs tests, and detects common Effect anti-patterns.
bun run verify
The application is primarily controlled via its command-line interface. All commands are executed via bun run src/main.ts.
-
Start a research query:
bun run src/main.ts query "What are the core principles of Effect?" --verboseOptions:
--context <string>: Provide additional context for the research (e.g., "For an experienced TypeScript developer new to Effect").--verbose(-v): Show detailed progress and logs.--max-agents <number>: Limit the number of parallel subagents (default: 10).
-
List all research sessions:
# List only currently active sessions (default) bun run src/main.ts list-sessions --active # List both active and historical sessions bun run src/main.ts list-sessions --all
-
Validate BAML client connectivity: This command makes a test call to the LLM to verify your BAML setup and API keys.
bun run src/main.ts validate-baml
-
Run validation suite on research logs:
bun run src/main.ts validate data/research-logs --include-har --format json --output-report data/reports/validation-report.json
This codebase strictly adheres to the idiomatic patterns of the Effect framework, as documented in CLAUDE.md.
-
No
try-catchinEffect.gen: All errors must be handled through Effect's typed error channel (E). UseEffect.try,Effect.either, orEffect.catchTaginstead. -
No Unsafe Type Assertions:
as any,as unknown,as neverare forbidden. Fix the root type issue instead. -
return yield*for Terminal Effects: Always usereturn yield*forEffect.fail,Effect.die, orEffect.interruptin conditional blocks to ensure correct type-narrowing. -
No Direct
.pipe()onyield*: Never writeyield* effect.pipe(...). Instead, assign the yielded value first, then pipe:const value = yield* effect; return value.pipe(...).
The project follows an Interface-First TDD methodology:
- Define Contracts: Define errors (
Data.TaggedError), models (@effect/schema), and service interfaces (Effect.Service). - Test the Contract (Red Phase): Write tests against the interface using in-memory fake
Layerimplementations. Ensure they fail. - Implement the Contract (Green Phase): Write the production
Layerto make the tests pass. - Refactor: Improve the implementation with the confidence that the contract tests provide a safety net.
-
Define Contracts (The "What"):
- Define all possible failure modes as
Data.TaggedErrorclasses insrc/domain/errors.ts. - Define data models using
@effect/schemainsrc/domain/models/. - Define the public service interface using
class MyService extends Effect.Service(...).
- Define all possible failure modes as
-
Write Tests (Red Phase):
- Write tests against the service interface in
src/services/__tests__/. - Use in-memory test doubles (
Layer.succeed) for dependencies. - Write tests for the happy path and all specified error paths. Failure tests must use
Effect.exitto inspect theCause. - Confirm that tests fail.
- Write tests against the service interface in
-
Implement (Green Phase):
- Write the minimal production
Layerimplementation to make the tests pass. - Run
bunx vitestcontinuously until all tests are green.
- Write the minimal production
-
Refactor:
- With a full suite of passing contract tests, refactor the implementation for clarity, performance, and maintainability.
Mandatory Validation Steps:
- After every file edit:
bun run typecheck && bun run check - Before every commit:
bun run verify
- Services: Use the
class MyService extends Effect.Service<...>()(...)pattern for defining services with their tag, dependencies, and default implementation. - Layers: Compose all services into a single
MainLayerandprovideit once at the application boundary (main.ts). - Composition: Use
Effect.genfor business logic with sequential steps and.pipe()for post-processing (error handling, tracing, retries). - Error Handling: Use
Data.TaggedErrorfor domain errors. Distinguish between recoverable failures (Effect.fail) and unrecoverable defects (Effect.die). - Data Modeling: All boundary-crossing data structures are defined with
@effect/schema, primarily usingSchema.Classfor opaque, branded types.
The project follows an idiomatic Effect structure that separates domain, services, infrastructure, and application logic.
src/
├── cli/ # CLI commands (@effect/cli)
├── domain/ # The "What": Pure data models and errors
│ ├── errors.ts # Data.TaggedError definitions for typed error handling
│ └── models/ # @effect/schema definitions for all domain entities
│ └── baml-types.ts # Effect schemas mirroring BAML-generated types
├── infrastructure/ # External system integrations (logging, metrics)
├── layers/ # Service dependency composition (MainLayer)
│ └── AppLayer.ts # Single MainLayer providing all services
├── services/ # The "How": Business logic as Effect services
│ ├── MultiAgentOrchestratorService.ts # Core research workflow logic
│ ├── BamlClientService.ts # Wrapper for BAML-generated client
│ ├── SessionManagerService.ts # Concurrent-safe session state management
│ ├── ToolRouterService.ts # Tool call validation and dispatching
│ ├── WebSearchService.ts # Brave Search API integration
│ ├── WebFetchService.ts # HTTP content fetching
│ └── __tests__/ # Service tests (unit, integration, smoke)
├── validation/ # Suite for validating research logs and tool calls
│ ├── parsers/ # Effect-based parsers for JSON logs and HAR files
│ ├── validators/ # Schema-based validation of tool calls
│ └── reporters/ # Console and JSON report generation
├── tests/ # Global test setup and utilities
└── main.ts # Application entry point (the ONLY Effect.run* location)
The project employs a multi-tiered testing strategy to ensure correctness and reliability.
-
Unit Tests: Test pure functions and individual schema validations in isolation.
-
Integration Tests: Test the interaction between services. These are the most common type of test in the suite, using the
TestAppLayerwhich provides in-memory fakes for all services. -
Smoke Tests: Fast-running integration tests that may hit real APIs with a very limited scope to provide quick feedback and validate basic end-to-end functionality.
Key Testing Principles:
-
Test Doubles as Layers: Instead of traditional mocking, tests provide in-memory implementations of services via test
Layers (seeTestDoubles.ts). This is the idiomatic Effect pattern, ensuring full type safety and contract adherence between production code and tests. -
Failure Testing: All tests for failing effects correctly use
Effect.exitto inspect theCauseof failure, ensuring that error channels are behaving as expected. -
@effect/vitest: All tests are written using@effect/vitest, withit.effectfor testing effects andassertfor assertions.
Running Tests:
# Run the entire test suite
bun test
# or
bunx vitest run
# Run tests in watch mode during development
bunx vitest
# Generate a coverage report
bunx vitest --coverageThe AI and agentic capabilities of this system are powered by BAML (Boundary ML).
-
Source of Truth: The
baml_src/directory contains all BAML function definitions, types, prompts, and templates. This is your "AI as Code" layer, version-controlled and declarative. -
Generated Client: The
bun baml:generatecommand reads your BAML files and creates a type-safe TypeScript client in thebaml_client/directory. -
Service Wrapper: The
BamlClientServiceprovides an Effect-native wrapper around the generated BAML client, adding idiomatic error handling, retries (llmRetrypolicy), timeouts, and typed error mapping. -
Schema Parity: Effect Schemas in
src/domain/models/baml-types.tsmirror the BAML-generated types to provide runtime validation, a single source of truth within the Effect domain, and seamless integration with the rest of the system.
The src/validation/ directory contains a powerful, Effect-native suite for validating research logs against the defined tool schemas. This is used for quality assurance, regression testing, and analyzing agent behavior.
-
Parsers: Effect-based parsers for research session JSON logs and HAR files, with full error handling and schema validation.
-
Validators:
tool-validator.effect.tsuses@effect/schemato validate every tool call against the defined contracts, ensuring agents are using tools correctly. -
Reporters: Generates detailed console and JSON reports with statistics on validation success rates, common errors, and tool usage patterns.
Usage:
bun run src/main.ts validate data/research-logs --include-har --format json --output-report data/reports/validation-report.jsonThe system is built with production-grade observability in mind. The included docker-compose.yml file spins up a complete observability stack.
-
Structured Logging: All logging is done via
Effect.log*functions. Theinfrastructure/logger.tsmodule provides helper functions to annotate logs with structured context (e.g.,sessionId,agentName), enabling powerful filtering and analysis in production. -
Distributed Tracing: The architecture is tracing-ready. Key operations are wrapped with
Effect.withSpanto create trace spans, which can be exported to systems like Jaeger or Datadog via OpenTelemetry. -
Metrics: The
MetricsServicedefines key application metrics (counters, histograms, gauges) usingEffect.Metric. Theprometheus.ymlfile provides configuration for Prometheus to scrape these metrics, and Grafana dashboards are pre-configured for visualization.
Launching the Observability Stack:
docker-compose up -dAccess Points:
- Jaeger UI:
http://localhost:16686(Distributed Tracing) - Prometheus:
http://localhost:9090(Metrics Collection) - Grafana:
http://localhost:3001(Dashboards and Visualization)- Default login:
admin/admin
- Default login:
Contributions are welcome. Please follow the established development workflow and architectural patterns.
- Follow the Installation & Setup instructions.
- Create a new branch for your feature or bug fix.
- Adhere to the "Interface-First, Contract-Driven TDD" workflow outlined in this README and in
docs/DEVELOPMENT.md. - Follow the patterns and rules documented in
CLAUDE.md. - Ensure all new code is accompanied by corresponding tests.
- Use Conventional Commit messages for your commits (e.g.,
feat:,fix:,docs:,refactor:). - Ensure all code passes the verification script before submitting a pull request:
bun run verify
This project is licensed under the MIT License.