Skip to content

foundry-works/foundry-research

Repository files navigation

foundry-research

MIT License Version Last commit

An open-source deep research plugin for Claude Code featuring academic database search, structured source management, and multi-agent synthesis. Designed for the Claude Code harness, but compatible with any model it supports — including GLM models from Z.ai.

Give it a topic and it produces a cited, structured literature synthesis — tracking hundreds of sources across 20+ academic and web providers, deeply reading the most relevant 20-30 full-text PDFs, and delivering a report with key findings, contradiction analysis, confidence ratings, and a full reference list. The uncanny valley example report was generated by GLM-5.1 models.

Motivation

As a Research Scientist, I built foundry-research because existing "Deep Research" tools consistently failed to meet my standards for source quality, depth, and transparency.

My first attempt was a factory — a multi-phase pipeline implemented as an MCP server with 10,000+ lines of Python code and dozens of configuration parameters. The agent submitted a query and waited while the pipeline ran through hard-coded phases. Every time it did something I didn't like, my instinct was to add more complexity. It produced pretty good results, but never great ones. I was treating symptoms, not rethinking the approach.

Then models got smarter. By early 2026, it was clear that capable agents could orchestrate their own research workflows — they didn't need to be put on rails. Over a weekend in March, I rebuilt the tool as a set of composable CLI tools with an orchestrating prompt. It performed just as well as the months-long factory. I had learned the bitter lesson myself: build tools that extend what an agent can do, not systems that replace what an agent can think.

That philosophy shapes everything about this tool. Instead of a rigid pipeline, foundry-research gives the agent composable tools — search, download, track, enrich — and leaves it in the driver's seat. The agent decides what to search, when to read deeply, and how to synthesize. The prompt teaches why good research methodology matters, not just what steps to follow.

The research methodology itself addresses what I found lacking in general-purpose research agents:

  • Strategic discovery over keyword scraping. General agents grab whatever ranks highest on Google. Real researchers target domain-specific archives (arXiv, PubMed, PsyArXiv) and prioritize by citation count.
  • Full-text depth over abstract skimming. The open-access cascade (Unpaywall, OSF, preprints) downloads actual PDFs, not just metadata.
  • Citation snowballing over one-shot search. Traversing citation graphs finds foundational papers and follow-up work that keyword searches miss.
  • Traceability over opacity. Every intermediate artifact is saved — raw PDFs in sources/, structured reading notes in notes/, and machine-readable evidence records in evidence/. This means you can verify claims, dig deeper into cited sources, or use the outputs as a launch point for your own literature review: quickly identifying major ideas, prominent figures, controversies, and research findings worth following up on.
  • Fact-checking over trust. A two-phase verification pipeline extracts load-bearing claims, then checks each one against structured evidence units with source provenance — tracing every quantitative claim back to the exact passage in the original source.
  • Clarity and accessibility over academese and marketing speak. A dedicated style review agent audits the draft for jargon, filler phrases, passive voice, and information density — then a reviser rewrites flagged passages in plain language without weakening scientific accuracy or removing hedging language.

Why Foundry Research?

While tools like Google's Deep Research have popularized agentic research, they are primarily optimized for general web search. foundry-research offers an alternative tailored for rigorous, evidence-backed synthesis:

  • Academic-First: Integrates with 20+ specialized providers including PubMed, arXiv, Semantic Scholar, Crossref, and OSF, rather than relying solely on web search. It utilizes grey sources like Anna's Archive and Sci-Hub to access paywalled papers that standard generalist tools miss.
  • Full-Text PDFs: It leverages an open-access cascade (Unpaywall, OSF, preprints) and grey sources to download full PDFs.
  • Fact-Checked: A two-phase verification pipeline extracts load-bearing claims from the report, cross-references them against structured evidence units with source provenance, and checks each claim against the original source text.
  • Structured Orchestration: Instead of a single autonomous agent that might get stuck in loops, it uses a 5-phase pipeline (Acquire → Read → Synthesize → Verify → Revise) with 9 specialized subagents. This scaffolding allows the subagents enough agency to be creative while keeping the process on track. This structured approach also means it can run successfully on less capable (and less expensive) models like GLM-5.1. Subagents are assigned models proportional to their reasoning needs — complex synthesis and verification run on the most capable models, while high-volume tasks like reading and logging use smaller, cheaper ones.
  • Terminal Integration: It runs as a plugin inside Claude Code, the best-in-class general-purpose agentic harness. This brings deep research directly into your IDE/terminal, so you can immediately use the findings in your development workflow.

Source Coverage by Domain

  • Computer Science & Math: arXiv, DBLP, Semantic Scholar
  • Biomedical & Life Sciences: PubMed, bioRxiv/medRxiv, PubMed Central
  • Cross-Disciplinary Academic: Semantic Scholar, OpenAlex, Crossref, CORE, OpenCitations
  • Preprints & Open Access: OSF (PsyArXiv, SocArXiv), arXiv, bioRxiv/medRxiv, Unpaywall
  • Finance & Regulatory: SEC EDGAR, Yahoo Finance
  • Community & Discussion: Reddit, Hacker News, GitHub
  • General Web: Tavily, Perplexity, Linkup, Exa, GenSee

See docs/providers.md for the full provider reference (API keys, rate limits, capabilities). Google Scholar is not included because it blocks automated access — the academic providers above offer comparable coverage through proper APIs.

Example Use Cases

  • Academic Literature Reviews: Synthesize the state of the art in a field, resolve competing claims against the evidence, and identify where research gaps remain.
  • Technical & Architectural Deep Dives: Compare competing approaches — architectures, algorithms, engineering practices — by weighing evidence from benchmarks, whitepapers, and implementation case studies.
  • Market & Competitive Intelligence: Synthesize SEC filings (via EDGAR integration), industry reports, and discussion forums (Reddit/HN) into competitive positioning, emerging trends, and market gaps.
  • Personal Medical Research: Investigate conditions, treatments, or supplements by deeply analyzing clinical trials and biomedical literature — comparing efficacy, side-effect profiles, and strength of evidence across studies. (Disclaimer: AI models can hallucinate and this tool does not provide medical advice. Always consult a qualified healthcare professional.)

When NOT to use this: If you just need a quick summary of a single web page or a factual answer, use out-of-the-box web search tools (like standard Claude or Perplexity). foundry-research is for multi-source synthesis that takes 30-60 minutes to run and produces comprehensive, verified reports.

Where it struggles: The tool relies on open-access sources, preprints, and grey source databases to obtain full-text PDFs. In domains where most research is locked behind paywalls and hasn't been liberated by preprint sharing practices (e.g., certain humanities disciplines, proprietary market research, paywalled legal databases), it will find metadata and abstracts but may have limited full-text coverage.

Pricing / Cost to Run

Running this tool requires two types of APIs: search providers and language models.

  • Search Providers (Free): You can run the entire pipeline without paying for search APIs. It integrates with 8 providers out-of-the-box (arXiv, PubMed, etc.) that don't require keys. For the mandatory web search provider, services like Tavily (1,000 free searches/month), Linkup (€5 free credits/month), and GenSee (currently free in Beta) offer generous free tiers.
  • Language Models (Paid): The only cost is the API usage for the underlying LLMs (e.g., Anthropic's Claude Opus/Sonnet/Haiku, or Z.ai's GLM models) that power the subagents.

Because the system downloads and reads 20-30 full-text PDFs, a single report consumes a significant number of tokens. To keep costs manageable, the pipeline assigns models by task complexity: Opus for high-reasoning work (acquisition, synthesis, verification, revision), Sonnet for structured reviews, and Haiku for high-volume repetitive tasks (reading sources, logging findings). I personally run the full pipeline on Z.ai's GLM-5.1 for all subagent roles without hitting rate limits — making the entire pipeline effectively free beyond search provider quotas. When using Claude models, the tiered model assignment keeps subscription usage and API costs proportional to task complexity. I have not yet benchmarked exact costs per report and will update this section when I do.

Quick Start

Start a Claude Code session and install the plugin from the marketplace:

/plugin marketplace add foundry-works/foundry-research
/plugin install foundry-research

Then run your first research session:

/foundry-research:deep-research What evidence exists for retrieval-augmented generation improving factual accuracy in LLMs?

You'll be prompted for a Tavily API key on first use. Get one at tavily.com, or see Configuration for other supported web search providers (like Perplexity, Exa, Linkup).

Tip for streamlined execution: foundry-research executes many tool calls during its 30-60 minute run. To avoid constant permission prompts, consider running Claude Code with the --dangerously-skip-permissions flag enabled. Warning: This gives the agent full autonomy to execute commands and modify files without your approval. For security, we highly recommend running this in an isolated environment, such as a Docker Sandbox (sbx run claude), to protect your host machine.

Install

From marketplace (recommended)

Inside a Claude Code session, run:

/plugin marketplace add foundry-works/foundry-research
/plugin install foundry-research

Local / development

git clone https://github.com/foundry-works/foundry-research.git
cd foundry-research
claude --plugin-dir .

Use /reload-plugins in the session to pick up changes.

Configure

On first enable, you'll be prompted for a Tavily API key. Additional API keys can be set as environment variables — see .env.example or docs/configuration.md for the full reference.

Skills

Research Workflow

Skills for producing and refining research reports.

Skill Invocation Description
deep-research /foundry-research:deep-research Search academic databases, download sources, produce a structured evidence-backed report
deep-research-revision /foundry-research:deep-research-revision Run a review-then-revise cycle on an existing report to fix accuracy, clarity, and completeness

Pipeline Improvement

Skills for evaluating and improving the research system itself.

Skill Invocation Description
reflect /foundry-research:reflect Score a completed session's quality with evidence-grounded assessments
improve /foundry-research:improve Identify cross-session patterns and produce an actionable improvement plan

Pipeline

A research session moves through five phases, each handled by specialized subagents:

Acquire → Read → Synthesize → Verify → Revise

Acquire — The orchestrator fans out searches across 20+ providers, triages results by relevance, and downloads the most promising sources.

Read — Each source is read, summarized, and indexed against the research questions. Readers extract structured evidence units with source provenance alongside markdown notes.

Synthesize — Findings are extracted per question and linked to evidence units, then woven into a theme-based report with citations and contradiction analysis.

Verify — Load-bearing claims are extracted and verified against evidence units with source provenance. Synthesis and style reviewers audit the draft for gaps, contradictions, and clarity.

Revise — Targeted edits are applied to the report based on reviewer feedback, preserving clean sections.

Agents

10 specialized subagents handle the work:

Agent Model Phase Role
brief-writer opus Acquire Generate research briefs with evaluative questions
source-acquisition opus Acquire Run search, triage, and download pipeline
research-reader haiku Read Read, summarize, and extract evidence units from source files
findings-logger haiku Read Extract and log findings per research question
synthesis-writer opus Synthesize Draft theme-based research reports
claim-extractor sonnet Verify Identify load-bearing claims for verification
claim-verifier sonnet Verify Verify claims against evidence units and reader notes
synthesis-reviewer sonnet Verify Audit drafts for contradictions and gaps
style-reviewer sonnet Verify Audit for clarity and plain-language style
report-reviser opus Revise Make targeted edits based on review issues

Requirements

  • Python 3.10+
  • Python venv module (on Debian/Ubuntu: sudo apt-get install python3-venv)

Dependencies are auto-installed on first use via virtual environment.

Documentation

Grey Sources

The PDF download cascade includes Anna's Archive and Sci-Hub, which provide access to paywalled papers. Using them is a personal choice. To disable:

export DEEP_RESEARCH_DISABLED_SOURCES="annas_archive,scihub"

See docs/grey-sources.md for details.

License

MIT

About

Claude Code agentic researcher designed for academic research, supporting 20+ academic and web source providers

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors