Skip to content

felipemeres/deep-researcher-langgraph

Repository files navigation

Deep Researcher 🔍

An advanced LangGraph agent for comprehensive web research with self-reflection and iterative improvement capabilities.

Overview

Deep Researcher is a sophisticated AI research assistant built with LangGraph that conducts thorough web research on any topic. It features:

  • Intelligent Query Generation: Automatically generates optimized search queries
  • Parallel Web Search: Conducts multiple searches simultaneously for comprehensive coverage
  • Smart Summarization: Automatically summarizes long documents using a refine strategy
  • Self-Reflection: Critiques its own output and iteratively improves research quality
  • Citation Management: Automatically adds proper citations and references
  • Multi-LLM Support: Works with OpenAI, Anthropic, Google, Groq, and more

Architecture

The Deep Researcher agent follows a sophisticated workflow:

1. Query Generation → 2. Multi-Query Expansion → 3. Parallel Web Search
                ↓
8. Self-Reflection ← 7. Add Citations ← 6. Create/Rewrite Content
                ↓
            5. Gather Sources ← 4. Summarize Long Docs

Key Features

  • Adaptive Research: Automatically identifies knowledge gaps and conducts additional research
  • Quality Control: Self-evaluates output quality before finalizing
  • Configurable: Flexible configuration for models, search depth, and reflection loops
  • Production-Ready: Built following LangGraph best practices for deployment

Quick Start

Prerequisites

  • Python 3.11 or higher
  • API keys for:
    • OpenAI (required)
    • Tavily (required for web search)
    • LangSmith (optional, recommended for tracing)

Installation

  1. Clone the repository
git clone https://github.com/yourusername/deep-researcher.git
cd deep-researcher
  1. Install dependencies

Using pip:

pip install -r requirements.txt

Or using Poetry:

poetry install
  1. Set up environment variables

Copy the example environment file:

cp .env.example .env

Edit .env and add your API keys:

OPENAI_API_KEY=your_openai_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here
LANGCHAIN_API_KEY=your_langsmith_api_key_here  # Optional

Usage

Using LangGraph Studio (Recommended)

  1. Install LangGraph Studio
  2. Open the deep-researcher directory in LangGraph Studio
  3. The graph will be automatically loaded
  4. Start researching!

Using Python API

from deep_researcher.graph import graph

# Basic research
result = await graph.ainvoke({
    "research_topic": "What are the latest developments in quantum computing?",
    "messages": []
})

print(result["result"])

With Custom Configuration

from deep_researcher.graph import graph

config = {
    "configurable": {
        "llm_writer": "anthropic/claude-3-5-sonnet-20241022",
        "llm_reviewer": "openai/gpt-4o",
        "max_reflection_loops": 3,
        "max_results": 10
    }
}

result = await graph.ainvoke({
    "research_topic": "Explain the impact of CRISPR gene editing on medicine",
    "messages": []
}, config)

print(result["result"])

Example Research Topics

Try these to see the agent in action:

# Technology
"What are the latest breakthroughs in artificial intelligence?"

# Science
"How does climate change affect ocean ecosystems?"

# Business
"What are the emerging trends in remote work technologies?"

# Health
"What are the most promising treatments for Alzheimer's disease?"

Configuration

Environment Variables

Variable Required Description
OPENAI_API_KEY Yes OpenAI API key for LLMs
TAVILY_API_KEY Yes Tavily API key for web search
LANGCHAIN_API_KEY No LangSmith API key for tracing
ANTHROPIC_API_KEY No Anthropic API key (if using Claude)
GROQ_API_KEY No Groq API key (if using Groq)

Runtime Configuration

Configure the agent's behavior at runtime:

config = {
    "configurable": {
        # Model selection
        "llm_writer": "openai/gpt-4o",           # Model for writing content
        "llm_reviewer": "openai/gpt-4o-mini",    # Model for reviewing
        "llm_summarizer": "openai/gpt-4o-mini",  # Model for summarization
        "llm_fallback": "openai/gpt-4o-mini",    # Fallback model
        
        # Research parameters
        "max_reflection_loops": 2,        # Max self-improvement iterations
        "max_results": 5,                 # Max search results per query
        "max_parallel_searches": 3,       # Number of parallel searches
        
        # Summarization parameters
        "summarize_chunk_size": 6000,     # Chunk size for summarization
        "summarize_max_source_length": 8000,  # Max source length
    }
}

Project Structure

deep-researcher/
├── deep_researcher/              # Main package
│   ├── __init__.py
│   ├── graph.py                  # Main graph definition
│   ├── state.py                  # State schemas
│   ├── configuration.py          # Configuration dataclass
│   ├── prompts.py                # Prompt templates
│   ├── utils/                    # Utility modules
│   │   ├── schemas.py            # Pydantic schemas
│   │   ├── search.py             # Search utilities
│   │   ├── llm_config.py         # LLM configuration
│   │   └── model_config.py       # Model provider configs
│   └── agents/                   # Sub-agents
│       └── summarize_refine/     # Summarization sub-graph
├── tests/                        # Test files
├── examples/                     # Example scripts
├── langgraph.json               # LangGraph config
├── requirements.txt             # Dependencies
├── pyproject.toml               # Poetry config
├── .env.example                 # Example environment
└── README.md                    # This file

How It Works

1. Query Generation

The agent analyzes your research topic and generates an optimized search query.

2. Query Expansion

Multiple related queries are created to cover different aspects of the topic.

3. Parallel Web Search

Conducts multiple web searches simultaneously using Tavily.

4. Document Summarization

Long documents are automatically summarized using a refine strategy:

  • Splits content into manageable chunks
  • Creates an initial summary
  • Iteratively refines with additional chunks

5. Content Creation

Synthesizes research findings into coherent, well-structured content with:

  • Clear explanations
  • Practical insights
  • Proper structure and formatting

6. Citation Addition

Automatically adds footnotes and references based on source materials.

7. Self-Reflection

Evaluates the content quality based on:

  • Usefulness and actionability
  • Completeness
  • Clarity
  • User focus

8. Iterative Improvement

If quality standards aren't met:

  • Identifies knowledge gaps
  • Conducts additional research
  • Rewrites content
  • Re-evaluates (up to max_reflection_loops)

Supported LLM Providers

  • OpenAI: GPT-4o, GPT-4o-mini, GPT-4-turbo
  • Anthropic: Claude 3.5 Sonnet, Claude 3 Opus/Haiku
  • Google: Gemini 2.0 Flash, Gemini Pro
  • Groq: Llama 3.3 70B, Mixtral 8x7B
  • Together AI: Various open-source models
  • OpenRouter: Access to multiple providers

Development

Running Tests

pytest tests/

Adding New Features

  1. Create feature branch
  2. Implement changes
  3. Add tests
  4. Update documentation
  5. Submit pull request

Debugging

Enable LangSmith tracing for detailed execution logs:

import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"

Deployment

LangGraph Cloud

Deploy to LangGraph Cloud for production:

langgraph deploy

Docker

Build and run with Docker:

docker build -t deep-researcher .
docker run -p 8000:8000 --env-file .env deep-researcher

Performance Tips

  1. Use faster models for reviewing: Set llm_reviewer to a faster model like GPT-4o-mini
  2. Adjust search depth: Reduce max_results and max_parallel_searches for faster results
  3. Limit reflection loops: Set max_reflection_loops=1 for quicker research
  4. Enable caching: Use LangSmith to cache LLM calls

Troubleshooting

Common Issues

"API key not found"

  • Ensure all required API keys are set in .env
  • Check that .env file is in the project root

"Search failed"

  • Verify your Tavily API key is valid
  • Check your internet connection

"Model not found"

  • Ensure the model string format is correct: provider/model-name
  • Verify you have API access to the specified model

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

License

MIT License - see LICENSE file for details

Acknowledgments

Support

Citation

If you use Deep Researcher in your research, please cite:

@software{deep_researcher,
  title = {Deep Researcher: Advanced LangGraph Research Agent},
  author = {Your Name},
  year = {2025},
  url = {https://github.com/yourusername/deep-researcher}
}

About

Advanced LangGraph agent for comprehensive web research with self-reflection and iterative improvement

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages