Skip to content

oscaromsn/research-squad

Repository files navigation

Research Squad: A Production-Grade Multi-Agent Research System

Research Squad is a multi-agent research system built with Effect and BAML. Inspired by the architecture of Anthropic's Claude Research feature, it provides a robust framework for orchestrating multiple specialized AI agents to conduct comprehensive research by decomposing complex user queries into parallelizable sub-tasks.

This project aims to provide a reference for building scalable, type-safe, and observable agentic systems in TypeScript adopting functional paradigms - demonstrating good practices in service-oriented architecture, contract-driven TDD and structured concurrency.

Key Features

  • Multi-Agent Orchestration: A hierarchical agent system (General Assistant → Research Lead → Subagents) that plans, delegates, and synthesizes research with sophisticated task decomposition and execution.
  • Type-Safe by Construction: Built entirely with Effect, ensuring all errors, dependencies, and data types are explicit and verified at compile-time. Utilizes Effect<A, E, R> to make success types, error conditions, and service dependencies explicit in the type system.
  • Declarative & Type-Safe: A purely functional, composable approach to managing complexity, with all business logic implemented using Effect.gen for intuitive, imperative-like syntax while maintaining declarative composition.
  • Structured Concurrency: Manages parallel sub-agent execution safely and efficiently with Effect's built-in, resource-safe concurrency primitives, including bounded parallelism, automatic resource cleanup, and backpressure.
  • Schema-Driven Data Modeling: Uses @effect/schema for all data models, providing runtime validation, static type generation, serialization, and bidirectional transformation from a single source of truth.
  • Robust Dependency Injection: Leverages Effect's Layer system for composable, memoized service construction, enabling clean dependency management and unparalleled testability.
  • Comprehensive Observability: Integrated support for structured logging, distributed tracing (via Effect.withSpan and OpenTelemetry), and metrics collection (Prometheus) for production monitoring.
  • AI Function Calling with BAML: Integrates with the BAML (Boundary ML) framework to define, version, test, and execute LLM function calls in a structured, declarative, and reliable manner.
  • Comprehensive Validation Suite: Includes a dedicated CLI and service for validating research logs, HAR files, and tool call schemas against defined contracts.
  • Production-Grade Error Handling: Distinguishes between recoverable failures (typed errors) and unrecoverable defects (bugs), with centralized, type-safe retry policies and exhaustive error handling.

Technology Stack

  • Runtime: Bun (>= 1.1.0)
  • Core Framework: Effect 3.x
  • Schema & Validation: @effect/schema
  • CLI: @effect/cli
  • AI Orchestration: BAML (Boundary ML)
  • Testing: Vitest with @effect/vitest
  • Linting & Formatting: Biome
  • Observability: OpenTelemetry (Jaeger for Tracing, Prometheus for Metrics, Grafana for Dashboards) via Docker Compose

Architectural Philosophy

This project is built upon the core principles of the Effect ecosystem and follows Domain-Driven Design (DDD) and Hexagonal Architecture:

  • Effects as Blueprints: An Effect is an immutable, lazy description of a program—a blueprint, not an execution. No side effects occur until the program is executed by a Runtime at the application's edge. This enables powerful composition, testing, and reasoning about program behavior.

  • Separation of "What" from "How": Business logic (what the program does) is defined as pure Effect workflows in services. Cross-cutting concerns like logging, retries, metrics, and dependency injection (how it runs) are composed declaratively using combinators and layers.

  • Make Impossible States Unrepresentable: The type system (Effect<A, E, R>, Schema, Data.TaggedError) is leveraged to enforce correctness at compile time, eliminating entire classes of runtime bugs. If it compiles, it's much more likely to be correct.

  • Composition over Inheritance: Complex functionality is built by composing small, independent, highly cohesive services and effects, not through inheritance hierarchies.

  • Dependency Inversion: All business logic depends on abstract service interfaces, not concrete implementations. This is enforced through Effect's Layer system, making the codebase highly modular and testable.

System Architecture

The system is architected around a clear separation of concerns, following the principles of Domain-Driven Design and Dependency Inversion.

Multi-Agent Hierarchy

The research process is orchestrated through a chain of specialized agents, each with a distinct responsibility:

  1. General Assistant: The entry point that receives the user's initial query and determines if it's a simple conversational turn or requires in-depth research.

  2. Research Lead Agent: If research is needed, this agent takes over. It analyzes the query, develops a research strategy, classifies the query type (e.g., breadth-first, depth-first), and generates a set of parallelizable tasks for sub-agents.

  3. Research Subagents: A team of parallel agents that execute the individual research tasks. They use tools like web_search and web_fetch to gather information from external sources.

  4. Citations Agent: A final-pass agent responsible for adding accurate, properly formatted citations to the synthesized report.

Service Dependency Graph

The application is composed of several single-responsibility services that are wired together at the application's edge.

┌──────────────────────────────────────────────────────────────┐
│                     Application Layer                         │
│                        (MainLayer)                            │
└──────────────────────────────────────────────────────────────┘
                              │
                              │ provides
                              ▼
┌──────────────────────────────────────────────────────────────┐
│              MultiAgentOrchestratorService                    │
│  • Coordinates the full research pipeline from query to report│
└──────────────────────────────────────────────────────────────┘
           │                    │                    │
           │ depends on         │ depends on         │ depends on
           ▼                    ▼                    ▼
┌─────────────────┐  ┌──────────────────┐  ┌───────────────────┐
│ BamlClientSvc   │  │SessionManagerSvc │  │  ToolRouterSvc    │
│ • LLM calls     │  │ • Session state  │  │ • Tool dispatch   │
│ • BAML funcs    │  │ • Lifecycle mgmt │  │ • Validation      │
│ • Retries       │  │ • Concurrent-safe│  │ • Schema checks   │
└─────────────────┘  └──────────────────┘  └───────────────────┘
                                                      │
                                                      │ depends on
                                      ┌───────────────┴───────────────┐
                                      ▼                               ▼
                           ┌──────────────────┐           ┌──────────────────┐
                           │ WebSearchService │           │ WebFetchService  │
                           │ • Brave Search   │           │ • HTTP fetching  │
                           │ • Result parsing │           │ • Content extract│
                           └──────────────────┘           └──────────────────┘

Key Services:

  • MultiAgentOrchestratorService: The central coordinator of the research pipeline, managing the entire workflow from query analysis to report synthesis.
  • BamlClientService: A type-safe wrapper around the BAML-generated LLM client, handling retries (llmRetry policy), timeouts, and error mapping.
  • SessionManagerService: A concurrent-safe service for managing the state and lifecycle of each research session using Effect's Ref primitive.
  • ToolRouterService: Validates and dispatches tool calls (e.g., web_search, web_fetch) to their respective implementations, ensuring schema compliance.
  • WebSearchService / WebFetchService: Infrastructure services that interact with external APIs (Brave Search, HTTP).
  • MetricsService: Collects and exposes application metrics for observability.

Layer Composition

All services are composed into a single MainLayer in src/layers/AppLayer.ts. This layer represents the complete dependency graph of the application. Following Effect's mandatory "single provide" rule, this MainLayer is provided once at the application's entry point (src/main.ts), ensuring all services are memoized and resources are managed within a single, unified scope.

Getting Started

Prerequisites

  • Bun >= 1.1.0 (curl -fsSL https://bun.sh/install | bash)
  • Node.js >= 18.0.0 (for compatibility)
  • Docker and Docker Compose (for running the observability stack)
  • API keys for Anthropic and Brave Search

Installation & Setup

  1. Clone the repository:

    git clone <repository-url>
    cd research-squad
  2. Install dependencies:

    bun install
  3. Configure environment variables: Copy the example environment file and fill in your API keys.

    cp .env.example .env

    Edit .env:

    ANTHROPIC_API_KEY="your_anthropic_key_here"
    BRAVE_API_KEY="your_brave_search_key_here"
  4. Generate the BAML client: This command reads your .baml files and generates a type-safe TypeScript client.

    bun baml:generate
  5. Run verification script: Before committing, always run the full verification script. This checks types, lints, formats, runs tests, and detects common Effect anti-patterns.

    bun run verify

Usage: CLI Commands

The application is primarily controlled via its command-line interface. All commands are executed via bun run src/main.ts.

  • Start a research query:

    bun run src/main.ts query "What are the core principles of Effect?" --verbose

    Options:

    • --context <string>: Provide additional context for the research (e.g., "For an experienced TypeScript developer new to Effect").
    • --verbose (-v): Show detailed progress and logs.
    • --max-agents <number>: Limit the number of parallel subagents (default: 10).
  • List all research sessions:

    # List only currently active sessions (default)
    bun run src/main.ts list-sessions --active
    
    # List both active and historical sessions
    bun run src/main.ts list-sessions --all
  • Validate BAML client connectivity: This command makes a test call to the LLM to verify your BAML setup and API keys.

    bun run src/main.ts validate-baml
  • Run validation suite on research logs:

    bun run src/main.ts validate data/research-logs --include-har --format json --output-report data/reports/validation-report.json

Core Development Patterns

This codebase strictly adheres to the idiomatic patterns of the Effect framework, as documented in CLAUDE.md.

Mandatory Rules

  1. No try-catch in Effect.gen: All errors must be handled through Effect's typed error channel (E). Use Effect.try, Effect.either, or Effect.catchTag instead.

  2. No Unsafe Type Assertions: as any, as unknown, as never are forbidden. Fix the root type issue instead.

  3. return yield* for Terminal Effects: Always use return yield* for Effect.fail, Effect.die, or Effect.interrupt in conditional blocks to ensure correct type-narrowing.

  4. No Direct .pipe() on yield*: Never write yield* effect.pipe(...). Instead, assign the yielded value first, then pipe: const value = yield* effect; return value.pipe(...).

Contract-Driven TDD

The project follows an Interface-First TDD methodology:

  1. Define Contracts: Define errors (Data.TaggedError), models (@effect/schema), and service interfaces (Effect.Service).
  2. Test the Contract (Red Phase): Write tests against the interface using in-memory fake Layer implementations. Ensure they fail.
  3. Implement the Contract (Green Phase): Write the production Layer to make the tests pass.
  4. Refactor: Improve the implementation with the confidence that the contract tests provide a safety net.

The Incremental TDD Cycle

  1. Define Contracts (The "What"):

    • Define all possible failure modes as Data.TaggedError classes in src/domain/errors.ts.
    • Define data models using @effect/schema in src/domain/models/.
    • Define the public service interface using class MyService extends Effect.Service(...).
  2. Write Tests (Red Phase):

    • Write tests against the service interface in src/services/__tests__/.
    • Use in-memory test doubles (Layer.succeed) for dependencies.
    • Write tests for the happy path and all specified error paths. Failure tests must use Effect.exit to inspect the Cause.
    • Confirm that tests fail.
  3. Implement (Green Phase):

    • Write the minimal production Layer implementation to make the tests pass.
    • Run bunx vitest continuously until all tests are green.
  4. Refactor:

    • With a full suite of passing contract tests, refactor the implementation for clarity, performance, and maintainability.

Mandatory Validation Steps:

  • After every file edit: bun run typecheck && bun run check
  • Before every commit: bun run verify

Key Effect Patterns

  • Services: Use the class MyService extends Effect.Service<...>()(...) pattern for defining services with their tag, dependencies, and default implementation.
  • Layers: Compose all services into a single MainLayer and provide it once at the application boundary (main.ts).
  • Composition: Use Effect.gen for business logic with sequential steps and .pipe() for post-processing (error handling, tracing, retries).
  • Error Handling: Use Data.TaggedError for domain errors. Distinguish between recoverable failures (Effect.fail) and unrecoverable defects (Effect.die).
  • Data Modeling: All boundary-crossing data structures are defined with @effect/schema, primarily using Schema.Class for opaque, branded types.

Project Structure

The project follows an idiomatic Effect structure that separates domain, services, infrastructure, and application logic.

src/
├── cli/              # CLI commands (@effect/cli)
├── domain/           # The "What": Pure data models and errors
│   ├── errors.ts     # Data.TaggedError definitions for typed error handling
│   └── models/       # @effect/schema definitions for all domain entities
│       └── baml-types.ts  # Effect schemas mirroring BAML-generated types
├── infrastructure/   # External system integrations (logging, metrics)
├── layers/           # Service dependency composition (MainLayer)
│   └── AppLayer.ts   # Single MainLayer providing all services
├── services/         # The "How": Business logic as Effect services
│   ├── MultiAgentOrchestratorService.ts  # Core research workflow logic
│   ├── BamlClientService.ts              # Wrapper for BAML-generated client
│   ├── SessionManagerService.ts          # Concurrent-safe session state management
│   ├── ToolRouterService.ts              # Tool call validation and dispatching
│   ├── WebSearchService.ts               # Brave Search API integration
│   ├── WebFetchService.ts                # HTTP content fetching
│   └── __tests__/                        # Service tests (unit, integration, smoke)
├── validation/       # Suite for validating research logs and tool calls
│   ├── parsers/      # Effect-based parsers for JSON logs and HAR files
│   ├── validators/   # Schema-based validation of tool calls
│   └── reporters/    # Console and JSON report generation
├── tests/            # Global test setup and utilities
└── main.ts           # Application entry point (the ONLY Effect.run* location)

Testing Strategy

The project employs a multi-tiered testing strategy to ensure correctness and reliability.

  • Unit Tests: Test pure functions and individual schema validations in isolation.

  • Integration Tests: Test the interaction between services. These are the most common type of test in the suite, using the TestAppLayer which provides in-memory fakes for all services.

  • Smoke Tests: Fast-running integration tests that may hit real APIs with a very limited scope to provide quick feedback and validate basic end-to-end functionality.

Key Testing Principles:

  • Test Doubles as Layers: Instead of traditional mocking, tests provide in-memory implementations of services via test Layers (see TestDoubles.ts). This is the idiomatic Effect pattern, ensuring full type safety and contract adherence between production code and tests.

  • Failure Testing: All tests for failing effects correctly use Effect.exit to inspect the Cause of failure, ensuring that error channels are behaving as expected.

  • @effect/vitest: All tests are written using @effect/vitest, with it.effect for testing effects and assert for assertions.

Running Tests:

# Run the entire test suite
bun test
# or
bunx vitest run

# Run tests in watch mode during development
bunx vitest

# Generate a coverage report
bunx vitest --coverage

BAML Integration

The AI and agentic capabilities of this system are powered by BAML (Boundary ML).

  • Source of Truth: The baml_src/ directory contains all BAML function definitions, types, prompts, and templates. This is your "AI as Code" layer, version-controlled and declarative.

  • Generated Client: The bun baml:generate command reads your BAML files and creates a type-safe TypeScript client in the baml_client/ directory.

  • Service Wrapper: The BamlClientService provides an Effect-native wrapper around the generated BAML client, adding idiomatic error handling, retries (llmRetry policy), timeouts, and typed error mapping.

  • Schema Parity: Effect Schemas in src/domain/models/baml-types.ts mirror the BAML-generated types to provide runtime validation, a single source of truth within the Effect domain, and seamless integration with the rest of the system.

Validation Suite

The src/validation/ directory contains a powerful, Effect-native suite for validating research logs against the defined tool schemas. This is used for quality assurance, regression testing, and analyzing agent behavior.

  • Parsers: Effect-based parsers for research session JSON logs and HAR files, with full error handling and schema validation.

  • Validators: tool-validator.effect.ts uses @effect/schema to validate every tool call against the defined contracts, ensuring agents are using tools correctly.

  • Reporters: Generates detailed console and JSON reports with statistics on validation success rates, common errors, and tool usage patterns.

Usage:

bun run src/main.ts validate data/research-logs --include-har --format json --output-report data/reports/validation-report.json

Observability

The system is built with production-grade observability in mind. The included docker-compose.yml file spins up a complete observability stack.

  • Structured Logging: All logging is done via Effect.log* functions. The infrastructure/logger.ts module provides helper functions to annotate logs with structured context (e.g., sessionId, agentName), enabling powerful filtering and analysis in production.

  • Distributed Tracing: The architecture is tracing-ready. Key operations are wrapped with Effect.withSpan to create trace spans, which can be exported to systems like Jaeger or Datadog via OpenTelemetry.

  • Metrics: The MetricsService defines key application metrics (counters, histograms, gauges) using Effect.Metric. The prometheus.yml file provides configuration for Prometheus to scrape these metrics, and Grafana dashboards are pre-configured for visualization.

Launching the Observability Stack:

docker-compose up -d

Access Points:

  • Jaeger UI: http://localhost:16686 (Distributed Tracing)
  • Prometheus: http://localhost:9090 (Metrics Collection)
  • Grafana: http://localhost:3001 (Dashboards and Visualization)
    • Default login: admin/admin

Contributing

Contributions are welcome. Please follow the established development workflow and architectural patterns.

  1. Follow the Installation & Setup instructions.
  2. Create a new branch for your feature or bug fix.
  3. Adhere to the "Interface-First, Contract-Driven TDD" workflow outlined in this README and in docs/DEVELOPMENT.md.
  4. Follow the patterns and rules documented in CLAUDE.md.
  5. Ensure all new code is accompanied by corresponding tests.
  6. Use Conventional Commit messages for your commits (e.g., feat:, fix:, docs:, refactor:).
  7. Ensure all code passes the verification script before submitting a pull request:
    bun run verify

License

This project is licensed under the MIT License.

References

About

Multi-agent research system inspired by Claude Research implemented in BAML + Effect

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages