An Agent-orchestrated academic paper discovery pipeline built on the OpenClaw platform. Paper-Agent automates the full lifecycle of research paper tracking: search β relevance scoring β human review β deep reading β code evaluation β knowledge synthesis & idea generation.
- Fully automated pipeline: From arXiv search to structured knowledge cards, run with a single command
- Agent-as-Brain architecture: LLM Agent handles scoring, deep reading, and idea generation; deterministic scripts handle data I/O
- Human-in-the-loop: Interactive review of borderline papers with accept/reject decisions
- Breakpoint resume: Pipeline state persisted to JSON, resume from any step after interruption
- Cross-run dedup: Never see the same paper twice across multiple pipeline runs
- Idea generation: Cross-paper insight synthesis produces actionable research ideas
Paper-Agent uses a two-layer architecture designed for seamless OpenClaw integration:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OpenClaw Platform β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Layer 1: Skill Definitions (SKILL.md) β β
β β βββββββββββββββ βββββββββββββββ β β
β β βpaper-pipelineβ βpaper-seed- β ... β β
β β β (orchestr.) β β init β β β
β β ββββββββ¬ββββββββ ββββββββ¬βββββββ β β
β βββββββββββΌββββββββββββββββββΌββββββββββββββββββββββ β
β β invokes β invokes β
β βββββββββββΌββββββββββββββββββΌββββββββββββββββββββββ β
β β Layer 2: Python Scripts β β
β β ββββββββ΄ββββββββ ββββββββ΄βββββββ β β
β β βpipeline_ β βseed_init.py β ... β β
β β β runner.py β β β β β
β β ββββββββββββββββ βββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Layer 1 β Skill Definitions (
skills/): SKILL.md files that tell the OpenClaw Agent what to do at each pipeline step - Layer 2 β Python Scripts (
scripts/): Deterministic tools that handle data fetching, dedup, parsing, and file I/O
Paper-Agent runs on OpenClaw. You need a working OpenClaw deployment.
Install these two skills in your OpenClaw instance:
| Skill | Purpose | Installation |
|---|---|---|
arxiv-paper-search |
arXiv API search wrapper | Follow its README to register in OpenClaw |
read-arxiv-paper |
Full paper reading & card.md generation | Follow its README to register in OpenClaw |
pip install -r requirements.txtcp profile.yaml.example profile.yamlEdit profile.yaml to set your:
- Research description: What you're working on
- Seed papers: arXiv IDs of your core reference papers
- Keywords: Search terms for paper discovery
- Whitelist authors: Researchers whose papers get a relevance bonus
Copy each skill directory from skills/ into your OpenClaw skills directory:
# Example: copy all paper-agent skills to OpenClaw
cp -r skills/paper-* /path/to/your/.openclaw/skills/Important: Set the
PAPER_AGENT_ROOTenvironment variable to point to this project's root directory, so that SKILL.md scripts can find the Python files:export PAPER_AGENT_ROOT=/path/to/paper-agent
Tell the Agent:
"Initialize core papers"
Or run directly:
python scripts/seed_init.pyTell the Agent:
"Start daily paper patrol"
Or trigger specific steps:
python scripts/pipeline_runner.py --step init
python scripts/pipeline_runner.py --step seed+search --run-id {run_id}
# ... see Pipeline Steps below| Step | Skill | Description |
|---|---|---|
| Step 0 | paper-pipeline | Initialize a new pipeline run, get run_id |
| Step 1 | paper-source-scraper | Search arXiv by keywords & authors, two-level dedup |
| Step 2 | paper-relevance-scorer | Prepare scoring context (few-shot examples from seed papers) |
| Step 3 | paper-relevance-scorer | Agent LLM scores each paper 0-10 on relevance |
| Step 4 | paper-relevance-scorer | Post-process: apply bonuses, sort into high/edge/low zones |
| Step 5 | paper-human-review | Interactive review of borderline papers (accept/reject) |
| Step 6 | paper-deep-parser | Deep read via read-arxiv-paper, extract structured fields |
| Step 6.5 | paper-repo-evaluator | Evaluate associated GitHub repositories |
| Step 6.8 | paper-knowledge-sync | Sync to knowledge base, generate research ideas |
| Step 7 | paper-pipeline | Generate run summary with statistics |
| Variable | Required | Description |
|---|---|---|
PAPER_AGENT_ROOT |
Recommended | Project root directory. Auto-detected if not set. |
ARXIV_SKILL_PATH |
Optional | Path to arxiv-paper-search skill scripts (defaults to OpenClaw convention) |
GITHUB_TOKEN |
Optional | GitHub personal access token for repo evaluation (increases API rate limit) |
paper-agent/
βββ scripts/ # Python tool scripts
β βββ pipeline_runner.py # Main orchestrator (--step subcommands)
β βββ seed_init.py # Seed paper initialization
β βββ source_scraper.py # arXiv search + two-level dedup
β βββ scorer_utils.py # Scoring context prep + post-processing
β βββ human_review.py # Interactive/async human review
β βββ card_parser.py # Knowledge card structured extraction
β βββ repo_evaluator.py # GitHub repo assessment
β βββ knowledge_sync.py # Knowledge base sync + idea generation
β βββ common/ # Shared utilities
β β βββ config_loader.py # Profile YAML loader
β β βββ path_manager.py # Centralized path management
β β βββ state_manager.py # Pipeline state persistence
β β βββ json_extractor.py # Fault-tolerant JSON extraction
β βββ tests/ # Unit tests
βββ skills/ # OpenClaw Skill definitions
β βββ paper-pipeline/SKILL.md # Main orchestration skill
β βββ paper-seed-init/SKILL.md
β βββ paper-source-scraper/SKILL.md
β βββ paper-relevance-scorer/SKILL.md
β βββ paper-human-review/SKILL.md
β βββ paper-deep-parser/SKILL.md
β βββ paper-repo-evaluator/SKILL.md
β βββ paper-knowledge-sync/SKILL.md
βββ profile.yaml.example # Configuration template
βββ requirements.txt # Python dependencies
βββ LICENSE # MIT License
βββ .gitignore
cd /path/to/paper-agent
python -m pytest scripts/tests/ -vOr run individual test files:
python -m unittest scripts/tests/test_pipeline_runner.py- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Before publishing, verify no sensitive data leaked:
# Scan for potential secrets
grep -ri "api_key\|token\|secret\|password\|webhook" --include="*.py" --include="*.md" --include="*.yaml"
# Scan for hardcoded paths
grep -ri "/projects/\|/data/\|/home/" --include="*.py" --include="*.md" --include="*.yaml"
# (Optional) Use trufflehog for deeper scanning
# pip install trufflehog
# trufflehog filesystem --directory .This project is licensed under the MIT License - see the LICENSE file for details.