An advanced LangGraph agent for comprehensive web research with self-reflection and iterative improvement capabilities.
Deep Researcher is a sophisticated AI research assistant built with LangGraph that conducts thorough web research on any topic. It features:
- Intelligent Query Generation: Automatically generates optimized search queries
- Parallel Web Search: Conducts multiple searches simultaneously for comprehensive coverage
- Smart Summarization: Automatically summarizes long documents using a refine strategy
- Self-Reflection: Critiques its own output and iteratively improves research quality
- Citation Management: Automatically adds proper citations and references
- Multi-LLM Support: Works with OpenAI, Anthropic, Google, Groq, and more
The Deep Researcher agent follows a sophisticated workflow:
1. Query Generation → 2. Multi-Query Expansion → 3. Parallel Web Search
↓
8. Self-Reflection ← 7. Add Citations ← 6. Create/Rewrite Content
↓
5. Gather Sources ← 4. Summarize Long Docs
- Adaptive Research: Automatically identifies knowledge gaps and conducts additional research
- Quality Control: Self-evaluates output quality before finalizing
- Configurable: Flexible configuration for models, search depth, and reflection loops
- Production-Ready: Built following LangGraph best practices for deployment
- Python 3.11 or higher
- API keys for:
- OpenAI (required)
- Tavily (required for web search)
- LangSmith (optional, recommended for tracing)
- Clone the repository
git clone https://github.com/yourusername/deep-researcher.git
cd deep-researcher- Install dependencies
Using pip:
pip install -r requirements.txtOr using Poetry:
poetry install- Set up environment variables
Copy the example environment file:
cp .env.example .envEdit .env and add your API keys:
OPENAI_API_KEY=your_openai_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here
LANGCHAIN_API_KEY=your_langsmith_api_key_here # Optional- Install LangGraph Studio
- Open the
deep-researcherdirectory in LangGraph Studio - The graph will be automatically loaded
- Start researching!
from deep_researcher.graph import graph
# Basic research
result = await graph.ainvoke({
"research_topic": "What are the latest developments in quantum computing?",
"messages": []
})
print(result["result"])from deep_researcher.graph import graph
config = {
"configurable": {
"llm_writer": "anthropic/claude-3-5-sonnet-20241022",
"llm_reviewer": "openai/gpt-4o",
"max_reflection_loops": 3,
"max_results": 10
}
}
result = await graph.ainvoke({
"research_topic": "Explain the impact of CRISPR gene editing on medicine",
"messages": []
}, config)
print(result["result"])Try these to see the agent in action:
# Technology
"What are the latest breakthroughs in artificial intelligence?"
# Science
"How does climate change affect ocean ecosystems?"
# Business
"What are the emerging trends in remote work technologies?"
# Health
"What are the most promising treatments for Alzheimer's disease?"| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY |
Yes | OpenAI API key for LLMs |
TAVILY_API_KEY |
Yes | Tavily API key for web search |
LANGCHAIN_API_KEY |
No | LangSmith API key for tracing |
ANTHROPIC_API_KEY |
No | Anthropic API key (if using Claude) |
GROQ_API_KEY |
No | Groq API key (if using Groq) |
Configure the agent's behavior at runtime:
config = {
"configurable": {
# Model selection
"llm_writer": "openai/gpt-4o", # Model for writing content
"llm_reviewer": "openai/gpt-4o-mini", # Model for reviewing
"llm_summarizer": "openai/gpt-4o-mini", # Model for summarization
"llm_fallback": "openai/gpt-4o-mini", # Fallback model
# Research parameters
"max_reflection_loops": 2, # Max self-improvement iterations
"max_results": 5, # Max search results per query
"max_parallel_searches": 3, # Number of parallel searches
# Summarization parameters
"summarize_chunk_size": 6000, # Chunk size for summarization
"summarize_max_source_length": 8000, # Max source length
}
}deep-researcher/
├── deep_researcher/ # Main package
│ ├── __init__.py
│ ├── graph.py # Main graph definition
│ ├── state.py # State schemas
│ ├── configuration.py # Configuration dataclass
│ ├── prompts.py # Prompt templates
│ ├── utils/ # Utility modules
│ │ ├── schemas.py # Pydantic schemas
│ │ ├── search.py # Search utilities
│ │ ├── llm_config.py # LLM configuration
│ │ └── model_config.py # Model provider configs
│ └── agents/ # Sub-agents
│ └── summarize_refine/ # Summarization sub-graph
├── tests/ # Test files
├── examples/ # Example scripts
├── langgraph.json # LangGraph config
├── requirements.txt # Dependencies
├── pyproject.toml # Poetry config
├── .env.example # Example environment
└── README.md # This file
The agent analyzes your research topic and generates an optimized search query.
Multiple related queries are created to cover different aspects of the topic.
Conducts multiple web searches simultaneously using Tavily.
Long documents are automatically summarized using a refine strategy:
- Splits content into manageable chunks
- Creates an initial summary
- Iteratively refines with additional chunks
Synthesizes research findings into coherent, well-structured content with:
- Clear explanations
- Practical insights
- Proper structure and formatting
Automatically adds footnotes and references based on source materials.
Evaluates the content quality based on:
- Usefulness and actionability
- Completeness
- Clarity
- User focus
If quality standards aren't met:
- Identifies knowledge gaps
- Conducts additional research
- Rewrites content
- Re-evaluates (up to max_reflection_loops)
- OpenAI: GPT-4o, GPT-4o-mini, GPT-4-turbo
- Anthropic: Claude 3.5 Sonnet, Claude 3 Opus/Haiku
- Google: Gemini 2.0 Flash, Gemini Pro
- Groq: Llama 3.3 70B, Mixtral 8x7B
- Together AI: Various open-source models
- OpenRouter: Access to multiple providers
pytest tests/- Create feature branch
- Implement changes
- Add tests
- Update documentation
- Submit pull request
Enable LangSmith tracing for detailed execution logs:
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"Deploy to LangGraph Cloud for production:
langgraph deployBuild and run with Docker:
docker build -t deep-researcher .
docker run -p 8000:8000 --env-file .env deep-researcher- Use faster models for reviewing: Set
llm_reviewerto a faster model like GPT-4o-mini - Adjust search depth: Reduce
max_resultsandmax_parallel_searchesfor faster results - Limit reflection loops: Set
max_reflection_loops=1for quicker research - Enable caching: Use LangSmith to cache LLM calls
"API key not found"
- Ensure all required API keys are set in
.env - Check that
.envfile is in the project root
"Search failed"
- Verify your Tavily API key is valid
- Check your internet connection
"Model not found"
- Ensure the model string format is correct:
provider/model-name - Verify you have API access to the specified model
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
MIT License - see LICENSE file for details
- Built with LangGraph
- Search powered by Tavily
- Inspired by the Ollama Deep Researcher
If you use Deep Researcher in your research, please cite:
@software{deep_researcher,
title = {Deep Researcher: Advanced LangGraph Research Agent},
author = {Your Name},
year = {2025},
url = {https://github.com/yourusername/deep-researcher}
}