Science Agent

Verify what AI writes about science — citations, claims, and cross-file consistency.

A README linked to PMID 12078741 as the foundational paper on Restricted Focus Viewers in vision science. The actual paper at that ID? "Determination of true ileal amino acid digestibility... in barley samples for growing-finishing pigs." The correct PMID was 12723780 — off by 645,039.

Try it now

npx github:andyed/science-agent audit ./docs --bibtex=./refs.bib

Or verify a single DOI against CrossRef:

npx github:andyed/science-agent verify 10.1038/nn.2889

No install, no clone — runs straight from this repo.

What it does

Citation Verification (shipped)

Catches AI-confabulated academic citations before they ship. Verifies inline references against BibTeX and CrossRef:

Pattern	How
Wrong title	Fuzzy title matching against BibTeX + CrossRef
Fabricated co-authors	CrossRef author list verification
Wrong DOI	CrossRef DOI resolution — checks that the DOI points to the claimed paper
Compound confabulation	CrossRef + title search detects merged citations
Ambiguous citation	Surname+year collision detection across BibTeX entries
Orphan citation	Inline reference with no BibTeX entry

Notebook Claim Verification (shipped)

Audits [NB##:K##] claim references in research prose — the convention where quantitative findings are traced to specific rows in notebook Key Claims blocks:

# Generate aggregate from notebooks (one-time setup)
science-agent aggregate ./notebooks/ -o docs/notebook-key-claims.md

# Audit all claim references in prose
science-agent notebook-audit ./docs \
  --aggregate=./docs/notebook-key-claims.md \
  --notebooks=./notebooks/ \
  --cross-repo=../downstream-repo

Detects:

Dangling references — [NB14:K3] cited in prose but K3 doesn't exist in NB14
Missing Key Claims blocks — notebook is cited but has no auditable claims table
Stale cross-repo values — downstream repo quotes pre-fix numbers from upstream

See docs/notebook-conventions.md for the full notebook contract.

Claude Code Agent (shipped)

Drop agent.md into your project's .claude/agents/ directory:

git clone https://github.com/andyed/science-agent.git
mkdir -p .claude/agents
cp science-agent/agent.md .claude/agents/science-agent.md

Then in Claude Code:

Ask "check my citations" or "audit references in docs/"
The agent activates automatically when it detects citation patterns
Uses WebFetch to verify DOIs against CrossRef — no install needed

CLI

# Install
git clone https://github.com/andyed/science-agent.git
cd science-agent && npm install

# Audit citations in a directory against a BibTeX file
node cli.js audit ./docs/specs --bibtex=./refs.bib

# Audit notebook claim references
node cli.js notebook-audit ./docs --aggregate=./docs/notebook-key-claims.md

# Generate a Key Claims aggregate from notebooks
node cli.js aggregate ./notebooks/ -o docs/notebook-key-claims.md

# Audit recent arXiv papers (baseline check)
node cli.js arxiv 10 --cat=cs.AI

# Verify a single DOI against CrossRef
node cli.js verify 10.1038/nn.2889

# Search CrossRef by title
node cli.js search "Metamers of the ventral stream"

Audit output

═══ Science Agent Audit ═══

  Directory: ./docs/specs
  BibTeX:    ./refs.bib
  Citations: 62
  In BibTeX: 33
  Orphans:   29
  With DOI:  29
  Ambiguous: 1
  Issues:    30

── Issues ──

  ⚠ [ambiguous] Pelli & Tillman (2008)
    wave3_crowding_validation.md
    matches 2 BibTeX entries — disambiguate with DOI or journal

  ℹ [orphan] Schwartz (1980)
    cmf_mip_derivation.md
    has no BibTeX entry

Why this exists

AI coding assistants confabulate academic citations at a measurable rate. The model gets 95% right — correct author surname, approximate title, right journal, right year — then fabricates the remaining 5%. This is dangerous because it passes casual review.

Source	Error rate
Human-authored arXiv papers (our spot-check)	0%
Human-written project docs (our audit)	0–7%
AI-assisted project docs (our audit)	12–24%
AI-generated citations across 13 LLMs (GhostCite)	14–95%

The fix: require a DOI for every citation. Verify it against CrossRef. In our corpus, DOI presence had a 0% confabulation rate. Science Agent automates this.

See FINDINGS.md for the complete audit with methodology, data, patterns, and the 95/5 confabulation taxonomy.

Roadmap

Claim content verification — detect wrong numbers attributed to real papers, cross-file consistency for shared parameters
Research corpus index — catalog local PDFs with extracted metadata, track what's been read
MCP server — tools for any MCP client

Related work

Project	What it does	How science-agent differs
GhostCite / CiteVerifier	DBLP-based citation title verification	We use CrossRef (broader coverage), verify DOIs + authors + titles, and work as a Claude Code agent
CiteAudit	Multi-agent verification pipeline + web service	Not open source. Science-agent is local-first, CLI, and embeds in your dev workflow
CiteME	Benchmark: can LLMs identify source papers from excerpts?	Benchmark, not a tool. Different task (retrieval vs. verification)
Context Rot	Measures general LLM degradation with context length	Methodology foundation for understanding why hallucination worsens under load
Claude Scholar	Full research lifecycle config for Claude Code	Workflow orchestrator with prompt-based citation checking. Science-agent could serve as its verification backend via MCP

The arms are AI. The microscope is yours.

andre-inter-collab-llc/research-workflow-assistant — Andre Nogueira's open-source Research Workflow Assistant: a VS Code + GitHub Copilot stack of custom agents and MCP servers (PubMed, OpenAlex, Semantic Scholar, Europe PMC, CrossRef, Zotero) for systematic reviews and academic writing. Different domain (biomedical research workflows vs citation verification), same underlying bet: researchers already have VS Code, git, and Markdown — give them an LLM with the right agent scaffolding and they can assemble their own compliant research assistants.

Origin

Built after discovering AI-confabulated citations in Scrutinizer, an open-source peripheral vision simulator. A collaborator checked the arxiv reference to his own paper during a meeting and found the title was wrong. The full audit revealed systematic patterns that replicated across a second project in a different domain.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
docs		docs
src		src
.gitignore		.gitignore
FINDINGS.md		FINDINGS.md
PLAN.md		PLAN.md
README.md		README.md
agent.md		agent.md
cli.js		cli.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Science Agent

Try it now

What it does

Citation Verification (shipped)

Notebook Claim Verification (shipped)

Claude Code Agent (shipped)

CLI

Audit output

Why this exists

Roadmap

Related work

Origin

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Science Agent

Try it now

What it does

Citation Verification (shipped)

Notebook Claim Verification (shipped)

Claude Code Agent (shipped)

CLI

Audit output

Why this exists

Roadmap

Related work

Origin

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages