Shuaike Shen*, Wenduo Cheng*, Mingqian Ma, Alistair Turcan, Martin Jinye Zhang, Jian Ma†
Ray & Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University
[*Equal contribution · †Correspondence: jianma@cs.cmu.edu]
Modern scientific ecosystems are rich in procedural knowledge — repositories, APIs, scripts, notebooks, documentation, databases, and papers — yet much of this knowledge remains fragmented and difficult for agents to operationalize. SkillFoundry bridges this gap with a self-evolving framework that converts heterogeneous scientific resources into validated, reusable agent skills.
| 267+ skills | mined across 28 scientific domains and 254 subdomains |
| 71.1% novelty | vs. existing skill libraries (SkillHub, SkillSMP) |
| 5/6 datasets improved | on MoSciBench benchmark |
| Genomics boost | substantial gains on two challenging genomics tasks |
SkillFoundry uses a domain knowledge tree as both a search prior and the evolving structure being updated, turning open-ended skill collection into a closed-loop acquisition process:
| Step | Stage | Description |
|---|---|---|
| 1 | Tree Construction | Build a rooted tree where internal nodes are domains/subdomains and leaves are actionable skill targets |
| 2 | Resource Mining | Select focus branches and retrieve relevant resources (repos, APIs, papers, notebooks, databases) |
| 3 | Skill Compilation | Extract operational contracts and compile into reusable skill packages with metadata, dependencies, and tests |
| 4 | Multi-Level Validation | Apply execution testing, system testing, and synthetic-data testing |
| 5 | Tree Expansion | Insert validated skills as new leaves, expanding domain coverage |
| 6 | Refinement & Loop | Revise, merge, or prune failing/redundant skills; repeat from step 2 |
SkillFoundry/
├── skillfoundry/ # Core automation framework (Python package)
│ ├── cli.py # CLI entry point
│ ├── orchestrator.py # Skill automation orchestrator
│ ├── campaign.py # Long-running campaign runner
│ ├── evaluation.py # Hierarchical skill evaluation
│ └── ...
├── scripts/ # Utility & validation scripts
├── registry/ # Taxonomy, resource registry, skill index
├── skills/ # Reusable skill folders grouped by domain (27 domains)
├── tests/ # Test suites (smoke, integration, regression)
├── site/ # Generated project page (static HTML/JS/CSS)
├── ref/ # Reference materials
└── Makefile # Build, validate, test, and smoke targets
- Python 3.10+
git clone https://github.com/ma-compbio-lab/SkillFoundry.git
cd SkillFoundry
pip install -e . # Install the skillfoundry packagemake validate # Validate repository structure
make build-site # Build the project page
make test # Run unit testsThe skillfoundry package provides a CLI for automated skill discovery, compilation, and evaluation. It orchestrates the closed-loop tree_check -> resource_search -> skill_build -> skill_test -> refresh pipeline.
Inspect the current repository summary and identify high-value frontier leaves:
python3 scripts/sciskill_framework.py --json status --focus-limit 10Run one or more automation loops to discover and build new skills:
# Single loop
python3 scripts/sciskill_framework.py cycle --loops 1 --verification-mode standard
# Parallel workers with custom focus
python3 scripts/sciskill_framework.py cycle \
--loops 2 --focus-limit 12 --stage-workers 4 \
--stages tree_check,resource_search,skill_build,skill_test,refresh \
--extra-context "Prioritize uncovered leaves in robotics and physics."Design a skill from a specific task description:
python3 scripts/sciskill_framework.py design-skill \
--prompt "Design a skill for literature-backed pathway enrichment benchmarking." \
--verification-mode validateRun hierarchical evaluation (correctness repair, benchmarking, novelty checking):
# Single skill
python3 scripts/sciskill_framework.py evaluate-skills \
--skill-slug openalex-literature-search \
--verification-mode validate
# Full library
python3 scripts/sciskill_framework.py evaluate-skills --all --verification-mode noneRun a long checkpointable campaign targeting specific domains:
python3 scripts/sciskill_framework.py campaign \
--focus-term genomics --focus-term proteomics \
--max-iterations 100 --max-runtime-minutes 450 \
--stage-workers 6 --evaluation-workers 6Citation information will be available once the paper is published. Check back later.
This project is licensed under the Apache License 2.0 and developed at Ma Lab, Carnegie Mellon University.
