Deep Researcher 🔍

An advanced LangGraph agent for comprehensive web research with self-reflection and iterative improvement capabilities.

Overview

Deep Researcher is a sophisticated AI research assistant built with LangGraph that conducts thorough web research on any topic. It features:

Intelligent Query Generation: Automatically generates optimized search queries
Parallel Web Search: Conducts multiple searches simultaneously for comprehensive coverage
Smart Summarization: Automatically summarizes long documents using a refine strategy
Self-Reflection: Critiques its own output and iteratively improves research quality
Citation Management: Automatically adds proper citations and references
Multi-LLM Support: Works with OpenAI, Anthropic, Google, Groq, and more

Architecture

The Deep Researcher agent follows a sophisticated workflow:

1. Query Generation → 2. Multi-Query Expansion → 3. Parallel Web Search
                ↓
8. Self-Reflection ← 7. Add Citations ← 6. Create/Rewrite Content
                ↓
            5. Gather Sources ← 4. Summarize Long Docs

Key Features

Adaptive Research: Automatically identifies knowledge gaps and conducts additional research
Quality Control: Self-evaluates output quality before finalizing
Configurable: Flexible configuration for models, search depth, and reflection loops
Production-Ready: Built following LangGraph best practices for deployment

Quick Start

Prerequisites

Python 3.11 or higher
API keys for:
- OpenAI (required)
- Tavily (required for web search)
- LangSmith (optional, recommended for tracing)

Installation

Clone the repository

git clone https://github.com/yourusername/deep-researcher.git
cd deep-researcher

Install dependencies

Using pip:

pip install -r requirements.txt

Or using Poetry:

poetry install

Set up environment variables

Copy the example environment file:

cp .env.example .env

Edit .env and add your API keys:

OPENAI_API_KEY=your_openai_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here
LANGCHAIN_API_KEY=your_langsmith_api_key_here  # Optional

Usage

Using LangGraph Studio (Recommended)

Install LangGraph Studio
Open the deep-researcher directory in LangGraph Studio
The graph will be automatically loaded
Start researching!

Using Python API

from deep_researcher.graph import graph

# Basic research
result = await graph.ainvoke({
    "research_topic": "What are the latest developments in quantum computing?",
    "messages": []
})

print(result["result"])

With Custom Configuration

from deep_researcher.graph import graph

config = {
    "configurable": {
        "llm_writer": "anthropic/claude-3-5-sonnet-20241022",
        "llm_reviewer": "openai/gpt-4o",
        "max_reflection_loops": 3,
        "max_results": 10
    }
}

result = await graph.ainvoke({
    "research_topic": "Explain the impact of CRISPR gene editing on medicine",
    "messages": []
}, config)

print(result["result"])

Example Research Topics

Try these to see the agent in action:

# Technology
"What are the latest breakthroughs in artificial intelligence?"

# Science
"How does climate change affect ocean ecosystems?"

# Business
"What are the emerging trends in remote work technologies?"

# Health
"What are the most promising treatments for Alzheimer's disease?"

Configuration

Environment Variables

Variable	Required	Description
`OPENAI_API_KEY`	Yes	OpenAI API key for LLMs
`TAVILY_API_KEY`	Yes	Tavily API key for web search
`LANGCHAIN_API_KEY`	No	LangSmith API key for tracing
`ANTHROPIC_API_KEY`	No	Anthropic API key (if using Claude)
`GROQ_API_KEY`	No	Groq API key (if using Groq)

Runtime Configuration

Configure the agent's behavior at runtime:

config = {
    "configurable": {
        # Model selection
        "llm_writer": "openai/gpt-4o",           # Model for writing content
        "llm_reviewer": "openai/gpt-4o-mini",    # Model for reviewing
        "llm_summarizer": "openai/gpt-4o-mini",  # Model for summarization
        "llm_fallback": "openai/gpt-4o-mini",    # Fallback model
        
        # Research parameters
        "max_reflection_loops": 2,        # Max self-improvement iterations
        "max_results": 5,                 # Max search results per query
        "max_parallel_searches": 3,       # Number of parallel searches
        
        # Summarization parameters
        "summarize_chunk_size": 6000,     # Chunk size for summarization
        "summarize_max_source_length": 8000,  # Max source length
    }
}

Project Structure

deep-researcher/
├── deep_researcher/              # Main package
│   ├── __init__.py
│   ├── graph.py                  # Main graph definition
│   ├── state.py                  # State schemas
│   ├── configuration.py          # Configuration dataclass
│   ├── prompts.py                # Prompt templates
│   ├── utils/                    # Utility modules
│   │   ├── schemas.py            # Pydantic schemas
│   │   ├── search.py             # Search utilities
│   │   ├── llm_config.py         # LLM configuration
│   │   └── model_config.py       # Model provider configs
│   └── agents/                   # Sub-agents
│       └── summarize_refine/     # Summarization sub-graph
├── tests/                        # Test files
├── examples/                     # Example scripts
├── langgraph.json               # LangGraph config
├── requirements.txt             # Dependencies
├── pyproject.toml               # Poetry config
├── .env.example                 # Example environment
└── README.md                    # This file

How It Works

1. Query Generation

The agent analyzes your research topic and generates an optimized search query.

2. Query Expansion

Multiple related queries are created to cover different aspects of the topic.

3. Parallel Web Search

Conducts multiple web searches simultaneously using Tavily.

4. Document Summarization

Long documents are automatically summarized using a refine strategy:

Splits content into manageable chunks
Creates an initial summary
Iteratively refines with additional chunks

5. Content Creation

Synthesizes research findings into coherent, well-structured content with:

Clear explanations
Practical insights
Proper structure and formatting

6. Citation Addition

Automatically adds footnotes and references based on source materials.

7. Self-Reflection

Evaluates the content quality based on:

Usefulness and actionability
Completeness
Clarity
User focus

8. Iterative Improvement

If quality standards aren't met:

Identifies knowledge gaps
Conducts additional research
Rewrites content
Re-evaluates (up to max_reflection_loops)

Supported LLM Providers

OpenAI: GPT-4o, GPT-4o-mini, GPT-4-turbo
Anthropic: Claude 3.5 Sonnet, Claude 3 Opus/Haiku
Google: Gemini 2.0 Flash, Gemini Pro
Groq: Llama 3.3 70B, Mixtral 8x7B
Together AI: Various open-source models
OpenRouter: Access to multiple providers

Development

Running Tests

pytest tests/

Adding New Features

Create feature branch
Implement changes
Add tests
Update documentation
Submit pull request

Debugging

Enable LangSmith tracing for detailed execution logs:

import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"

Deployment

LangGraph Cloud

Deploy to LangGraph Cloud for production:

langgraph deploy

Docker

Build and run with Docker:

docker build -t deep-researcher .
docker run -p 8000:8000 --env-file .env deep-researcher

Performance Tips

Use faster models for reviewing: Set llm_reviewer to a faster model like GPT-4o-mini
Adjust search depth: Reduce max_results and max_parallel_searches for faster results
Limit reflection loops: Set max_reflection_loops=1 for quicker research
Enable caching: Use LangSmith to cache LLM calls

Troubleshooting

Common Issues

"API key not found"

Ensure all required API keys are set in .env
Check that .env file is in the project root

"Search failed"

Verify your Tavily API key is valid
Check your internet connection

"Model not found"

Ensure the model string format is correct: provider/model-name
Verify you have API access to the specified model

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

License

MIT License - see LICENSE file for details

Acknowledgments

Built with LangGraph
Search powered by Tavily
Inspired by the Ollama Deep Researcher

Support

Citation

If you use Deep Researcher in your research, please cite:

@software{deep_researcher,
  title = {Deep Researcher: Advanced LangGraph Research Agent},
  author = {Your Name},
  year = {2025},
  url = {https://github.com/yourusername/deep-researcher}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
deep_researcher		deep_researcher
examples		examples
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
README.md		README.md
REPOSITORY_STRUCTURE.md		REPOSITORY_STRUCTURE.md
SETUP_GUIDE.md		SETUP_GUIDE.md
langgraph.json		langgraph.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
structure.txt		structure.txt

License

felipemeres/deep-researcher-langgraph

Folders and files

Latest commit

History

Repository files navigation

Deep Researcher 🔍

Overview

Architecture

Key Features

Quick Start

Prerequisites

Installation

Usage

Using LangGraph Studio (Recommended)

Using Python API

With Custom Configuration

Example Research Topics

Configuration

Environment Variables

Runtime Configuration

Project Structure

How It Works

1. Query Generation

2. Query Expansion

3. Parallel Web Search

4. Document Summarization

5. Content Creation

6. Citation Addition

7. Self-Reflection

8. Iterative Improvement

Supported LLM Providers

Development

Running Tests

Adding New Features

Debugging

Deployment

LangGraph Cloud

Docker

Performance Tips

Troubleshooting

Common Issues

Contributing

License

Acknowledgments

Support

Citation

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages