Skip to content

ma-compbio-lab/SkillFoundry

Repository files navigation

SkillFoundry

Building Self-Evolving Agent Skill Libraries from Heterogeneous Scientific Resources

Shuaike Shen*, Wenduo Cheng*, Mingqian Ma, Alistair Turcan, Martin Jinye Zhang, Jian Ma†

Ray & Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University

[*Equal contribution · †Correspondence: jianma@cs.cmu.edu]

Project Page  arXiv  Paper  GitHub


Overview

Modern scientific ecosystems are rich in procedural knowledge — repositories, APIs, scripts, notebooks, documentation, databases, and papers — yet much of this knowledge remains fragmented and difficult for agents to operationalize. SkillFoundry bridges this gap with a self-evolving framework that converts heterogeneous scientific resources into validated, reusable agent skills.

SkillFoundry framework overview
Figure 1. SkillFoundry framework overview: from domain knowledge tree to validated skill library.

Key Results

267+ skills mined across 28 scientific domains and 254 subdomains
71.1% novelty vs. existing skill libraries (SkillHub, SkillSMP)
5/6 datasets improved on MoSciBench benchmark
Genomics boost substantial gains on two challenging genomics tasks

How It Works

SkillFoundry uses a domain knowledge tree as both a search prior and the evolving structure being updated, turning open-ended skill collection into a closed-loop acquisition process:

Step Stage Description
1 Tree Construction Build a rooted tree where internal nodes are domains/subdomains and leaves are actionable skill targets
2 Resource Mining Select focus branches and retrieve relevant resources (repos, APIs, papers, notebooks, databases)
3 Skill Compilation Extract operational contracts and compile into reusable skill packages with metadata, dependencies, and tests
4 Multi-Level Validation Apply execution testing, system testing, and synthetic-data testing
5 Tree Expansion Insert validated skills as new leaves, expanding domain coverage
6 Refinement & Loop Revise, merge, or prune failing/redundant skills; repeat from step 2

Repository Structure

SkillFoundry/
├── skillfoundry/             # Core automation framework (Python package)
│   ├── cli.py                #   CLI entry point
│   ├── orchestrator.py       #   Skill automation orchestrator
│   ├── campaign.py           #   Long-running campaign runner
│   ├── evaluation.py         #   Hierarchical skill evaluation
│   └── ...
├── scripts/                  # Utility & validation scripts
├── registry/                 # Taxonomy, resource registry, skill index
├── skills/                   # Reusable skill folders grouped by domain (27 domains)
├── tests/                    # Test suites (smoke, integration, regression)
├── site/                     # Generated project page (static HTML/JS/CSS)
├── ref/                      # Reference materials
└── Makefile                  # Build, validate, test, and smoke targets

Getting Started

Prerequisites

  • Python 3.10+

Installation

git clone https://github.com/ma-compbio-lab/SkillFoundry.git
cd SkillFoundry
pip install -e .       # Install the skillfoundry package

Quick Validation

make validate        # Validate repository structure
make build-site      # Build the project page
make test            # Run unit tests

Framework Usage

The skillfoundry package provides a CLI for automated skill discovery, compilation, and evaluation. It orchestrates the closed-loop tree_check -> resource_search -> skill_build -> skill_test -> refresh pipeline.

Status

Inspect the current repository summary and identify high-value frontier leaves:

python3 scripts/sciskill_framework.py --json status --focus-limit 10

Cycle

Run one or more automation loops to discover and build new skills:

# Single loop
python3 scripts/sciskill_framework.py cycle --loops 1 --verification-mode standard

# Parallel workers with custom focus
python3 scripts/sciskill_framework.py cycle \
  --loops 2 --focus-limit 12 --stage-workers 4 \
  --stages tree_check,resource_search,skill_build,skill_test,refresh \
  --extra-context "Prioritize uncovered leaves in robotics and physics."

Design Skill

Design a skill from a specific task description:

python3 scripts/sciskill_framework.py design-skill \
  --prompt "Design a skill for literature-backed pathway enrichment benchmarking." \
  --verification-mode validate

Evaluate Skills

Run hierarchical evaluation (correctness repair, benchmarking, novelty checking):

# Single skill
python3 scripts/sciskill_framework.py evaluate-skills \
  --skill-slug openalex-literature-search \
  --verification-mode validate

# Full library
python3 scripts/sciskill_framework.py evaluate-skills --all --verification-mode none

Campaign

Run a long checkpointable campaign targeting specific domains:

python3 scripts/sciskill_framework.py campaign \
  --focus-term genomics --focus-term proteomics \
  --max-iterations 100 --max-runtime-minutes 450 \
  --stage-workers 6 --evaluation-workers 6

Citation

Citation information will be available once the paper is published. Check back later.


License

This project is licensed under the Apache License 2.0 and developed at Ma Lab, Carnegie Mellon University.

About

A framework for discovering, compiling, and validating reusable skills for scientific agents.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors