Skip to content

AkaliKong/PaperClaw

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“š Paper-Agent

An Agent-orchestrated academic paper discovery pipeline built on the OpenClaw platform. Paper-Agent automates the full lifecycle of research paper tracking: search β†’ relevance scoring β†’ human review β†’ deep reading β†’ code evaluation β†’ knowledge synthesis & idea generation.

✨ Key Features

  • Fully automated pipeline: From arXiv search to structured knowledge cards, run with a single command
  • Agent-as-Brain architecture: LLM Agent handles scoring, deep reading, and idea generation; deterministic scripts handle data I/O
  • Human-in-the-loop: Interactive review of borderline papers with accept/reject decisions
  • Breakpoint resume: Pipeline state persisted to JSON, resume from any step after interruption
  • Cross-run dedup: Never see the same paper twice across multiple pipeline runs
  • Idea generation: Cross-paper insight synthesis produces actionable research ideas

πŸ—οΈ Architecture

Paper-Agent uses a two-layer architecture designed for seamless OpenClaw integration:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  OpenClaw Platform                    β”‚
β”‚                                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  Layer 1: Skill Definitions (SKILL.md)          β”‚ β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚ β”‚
β”‚  β”‚  β”‚paper-pipelineβ”‚  β”‚paper-seed-  β”‚  ...         β”‚ β”‚
β”‚  β”‚  β”‚  (orchestr.) β”‚  β”‚  init       β”‚              β”‚ β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜              β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚            β”‚ invokes         β”‚ invokes                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  Layer 2: Python Scripts                         β”‚ β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”              β”‚ β”‚
β”‚  β”‚  β”‚pipeline_     β”‚  β”‚seed_init.py β”‚  ...         β”‚ β”‚
β”‚  β”‚  β”‚  runner.py   β”‚  β”‚             β”‚              β”‚ β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  • Layer 1 β€” Skill Definitions (skills/): SKILL.md files that tell the OpenClaw Agent what to do at each pipeline step
  • Layer 2 β€” Python Scripts (scripts/): Deterministic tools that handle data fetching, dedup, parsing, and file I/O

πŸ“‹ Prerequisites

1. OpenClaw Platform

Paper-Agent runs on OpenClaw. You need a working OpenClaw deployment.

2. Third-Party OpenClaw Skills

Install these two skills in your OpenClaw instance:

Skill Purpose Installation
arxiv-paper-search arXiv API search wrapper Follow its README to register in OpenClaw
read-arxiv-paper Full paper reading & card.md generation Follow its README to register in OpenClaw

3. Python Dependencies

pip install -r requirements.txt

πŸš€ Quick Start

Step 1: Configure Your Profile

cp profile.yaml.example profile.yaml

Edit profile.yaml to set your:

  • Research description: What you're working on
  • Seed papers: arXiv IDs of your core reference papers
  • Keywords: Search terms for paper discovery
  • Whitelist authors: Researchers whose papers get a relevance bonus

Step 2: Register Skills in OpenClaw

Copy each skill directory from skills/ into your OpenClaw skills directory:

# Example: copy all paper-agent skills to OpenClaw
cp -r skills/paper-* /path/to/your/.openclaw/skills/

Important: Set the PAPER_AGENT_ROOT environment variable to point to this project's root directory, so that SKILL.md scripts can find the Python files:

export PAPER_AGENT_ROOT=/path/to/paper-agent

Step 3: Initialize Seed Papers

Tell the Agent:

"Initialize core papers"

Or run directly:

python scripts/seed_init.py

Step 4: Run the Pipeline

Tell the Agent:

"Start daily paper patrol"

Or trigger specific steps:

python scripts/pipeline_runner.py --step init
python scripts/pipeline_runner.py --step seed+search --run-id {run_id}
# ... see Pipeline Steps below

πŸ“– Pipeline Steps

Step Skill Description
Step 0 paper-pipeline Initialize a new pipeline run, get run_id
Step 1 paper-source-scraper Search arXiv by keywords & authors, two-level dedup
Step 2 paper-relevance-scorer Prepare scoring context (few-shot examples from seed papers)
Step 3 paper-relevance-scorer Agent LLM scores each paper 0-10 on relevance
Step 4 paper-relevance-scorer Post-process: apply bonuses, sort into high/edge/low zones
Step 5 paper-human-review Interactive review of borderline papers (accept/reject)
Step 6 paper-deep-parser Deep read via read-arxiv-paper, extract structured fields
Step 6.5 paper-repo-evaluator Evaluate associated GitHub repositories
Step 6.8 paper-knowledge-sync Sync to knowledge base, generate research ideas
Step 7 paper-pipeline Generate run summary with statistics

βš™οΈ Environment Variables

Variable Required Description
PAPER_AGENT_ROOT Recommended Project root directory. Auto-detected if not set.
ARXIV_SKILL_PATH Optional Path to arxiv-paper-search skill scripts (defaults to OpenClaw convention)
GITHUB_TOKEN Optional GitHub personal access token for repo evaluation (increases API rate limit)

πŸ“ Directory Structure

paper-agent/
β”œβ”€β”€ scripts/                        # Python tool scripts
β”‚   β”œβ”€β”€ pipeline_runner.py          # Main orchestrator (--step subcommands)
β”‚   β”œβ”€β”€ seed_init.py                # Seed paper initialization
β”‚   β”œβ”€β”€ source_scraper.py           # arXiv search + two-level dedup
β”‚   β”œβ”€β”€ scorer_utils.py             # Scoring context prep + post-processing
β”‚   β”œβ”€β”€ human_review.py             # Interactive/async human review
β”‚   β”œβ”€β”€ card_parser.py              # Knowledge card structured extraction
β”‚   β”œβ”€β”€ repo_evaluator.py           # GitHub repo assessment
β”‚   β”œβ”€β”€ knowledge_sync.py           # Knowledge base sync + idea generation
β”‚   β”œβ”€β”€ common/                     # Shared utilities
β”‚   β”‚   β”œβ”€β”€ config_loader.py        # Profile YAML loader
β”‚   β”‚   β”œβ”€β”€ path_manager.py         # Centralized path management
β”‚   β”‚   β”œβ”€β”€ state_manager.py        # Pipeline state persistence
β”‚   β”‚   └── json_extractor.py       # Fault-tolerant JSON extraction
β”‚   └── tests/                      # Unit tests
β”œβ”€β”€ skills/                         # OpenClaw Skill definitions
β”‚   β”œβ”€β”€ paper-pipeline/SKILL.md     # Main orchestration skill
β”‚   β”œβ”€β”€ paper-seed-init/SKILL.md
β”‚   β”œβ”€β”€ paper-source-scraper/SKILL.md
β”‚   β”œβ”€β”€ paper-relevance-scorer/SKILL.md
β”‚   β”œβ”€β”€ paper-human-review/SKILL.md
β”‚   β”œβ”€β”€ paper-deep-parser/SKILL.md
β”‚   β”œβ”€β”€ paper-repo-evaluator/SKILL.md
β”‚   └── paper-knowledge-sync/SKILL.md
β”œβ”€β”€ profile.yaml.example            # Configuration template
β”œβ”€β”€ requirements.txt                # Python dependencies
β”œβ”€β”€ LICENSE                         # MIT License
└── .gitignore

πŸ§ͺ Running Tests

cd /path/to/paper-agent
python -m pytest scripts/tests/ -v

Or run individual test files:

python -m unittest scripts/tests/test_pipeline_runner.py

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ”’ Pre-Release Security Checklist

Before publishing, verify no sensitive data leaked:

# Scan for potential secrets
grep -ri "api_key\|token\|secret\|password\|webhook" --include="*.py" --include="*.md" --include="*.yaml"

# Scan for hardcoded paths
grep -ri "/projects/\|/data/\|/home/" --include="*.py" --include="*.md" --include="*.yaml"

# (Optional) Use trufflehog for deeper scanning
# pip install trufflehog
# trufflehog filesystem --directory .

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages