A Claude Code skill that builds, validates, and submits RL training environments on Prime Intellect Lab — from any HuggingFace dataset to a live GRPO training run, fully automated.
You describe the task. Claude writes the environment, validates it, pushes it to the Hub, and submits a hosted GRPO training run.

Given a prompt like:
"Build an environment for cais/mmlu abstract algebra, use Qwen/Qwen3-30B-Instruct-2507, 1000 steps, and submit the training job."
Claude autonomously:
- Inspects the dataset schema and answer format
- Writes the reward function and environment package
- Unit tests the reward function against auto-generated cases
- Runs a real evaluation via Prime Inference
- Validates the reward distribution is learnable
- Pushes the environment to the Prime Hub
- Submits a hosted GRPO training run
No GPU setup. No infrastructure config. One prompt.
- Prime Intellect account
- Claude Code (
npm install -g @anthropic-ai/claude-code)
pip install prime verifiers datasetsprime login # opens browser → authenticates everythingprime login covers inference, the environment hub, and training submission.
HuggingFace auth is not required for public datasets. Only needed if your dataset is private on HF, or you want to publish trained model checkpoints to HF Hub after training.
git clone https://github.com/abideen/prime-lab-trainer
cd prime-lab-trainer
pip install prime verifiers datasets
prime login
claudeThen in Claude:
Build an RL training environment for the pubmed_qa dataset and submit training.
This project is a Claude Code skill — a structured instruction set in SKILL.md that Claude reads automatically when you open the project. Claude follows a strict 7-step workflow, validating at every stage before proceeding.
Step 0 Preflight Check Verify environment, credentials, tools
Step 1 Inspect Dataset Learn schema, columns, answer format
Step 2 Write Environment Reward function + 3-file package
Step 3 Validate Reward Unit tests against auto-generated cases
Step 4 Local Eval Real model completions via Prime Inference
Step 5 Distribution Check Statistical validation of reward signal
Step 6 Push to Hub Publish environment to Prime Hub
Step 7 Submit Training GRPO training run on hosted GPUs
Claude never skips a step. If any step fails, it stops and fixes the issue before continuing.
Verifies all prerequisites before touching any code.
python scripts/preflight.py
# or auto-install missing packages:
python scripts/preflight.py --installChecks: Python ≥ 3.10, datasets, verifiers, prime CLI, Prime login.
Loads the dataset and reports its exact schema before writing any code.
python scripts/inspect_dataset.py pubmed_qa
python scripts/inspect_dataset.py openai/gsm8k main trainOutput includes column names, types, sample rows, and an auto-detected answer format (GSM8K ####, numeric, boolean, multi-choice, or free text).
Creates three files in environments/my_env/:
environments/my_env/
├── my_env.py ← reward function + load_environment()
├── pyproject.toml ← package metadata and dependencies
└── README.md ← required for Hub display
Environments use the verifiers library. See references/environment_guide.md for complete patterns: classification, multi-choice, judge-scored free text, multi-turn, and more.
Auto-generates test cases from the environment's own dataset and runs sanity checks.
python scripts/test_reward.py environments/my_env/my_env.py
python scripts/test_reward.py environments/my_env/my_env.py --verboseSanity checks:
- Gold completions score higher than wrong completions
- At least one gold reward > 0.5
- At least one wrong reward < 0.5
- Not all rewards identical
✓ All sanity checks passed. Reward function looks correct.
Proceed to: prime eval run <env-name> -m -n 50
Runs real model completions against the environment using Prime Inference.
prime env install my-env --with pip
prime eval run my-env -m qwen/qwen3-vl-30b-a3b-instruct -n 50Uses your prime login credentials — no additional API key needed. See prime inference models for the full list of available models.
Statistically validates that the reward signal is learnable before committing to a training run.
python scripts/check_reward_distribution.py| Check | Pass Condition | Failure Means |
|---|---|---|
| Sample count | n ≥ 20 | Too few samples |
| Not all zero | mean > 0.05 | Reward function broken |
| Not all ones | mean < 0.95 | Trivially easy / reward hacked |
| Variance | std > 0.05 | Training signal too weak |
| Format reward | > 0.3 | Model can't follow format |
All 5 must pass before pushing to hub.
Publishes the environment to the Prime Environments Hub.
prime env push my-envVerify at: https://app.primeintellect.ai/dashboard/environments
Generates a TOML config and submits a hosted GRPO training run.
python scripts/submit_training.py --env your-username/my-env
# or with flags:
python scripts/submit_training.py --env your-username/my-env \
--model Qwen/Qwen3-30B-Instruct-2507 \
--steps 1000 \
--yesGPU allocation is fully managed — no instance config needed.
Monitor your run:
prime rl list
prime rl logs <run-id> -f
prime rl metrics <run-id>Dashboard: https://app.primeintellect.ai/dashboard/training
If the environment already exists on the Prime Hub, skip Steps 1–6.
First, disambiguate — both HF datasets and Prime environments use owner/name format:
prime env info owner/name
# exit 0 → Prime environment → use fast path
# exit 1 → HF dataset or not found → follow Steps 1–7Then submit directly:
python scripts/submit_training.py --env zain/OpenMed_PubMedQAprime-lab-trainer/
│
├── SKILL.md ← Claude Code skill definition (agent instructions)
│
├── scripts/
│ ├── preflight.py ← Step 0: verify setup
│ ├── inspect_dataset.py ← Step 1: inspect HF dataset schema
│ ├── test_reward.py ← Step 3: unit test reward functions
│ ├── check_reward_distribution.py← Step 5: validate reward distribution
│ └── submit_training.py ← Step 7: generate TOML + submit
│
├── references/
│ ├── environment_guide.md ← Full environment authoring reference
│ └── training_guide.md ← Full training config reference
│
├── environments/ ← Your environments go here
│ └── my_env/
│ ├── my_env.py
│ ├── pyproject.toml
│ └── README.md
│
└── configs/
└── rl/ ← Auto-generated TOML configs
└── my-env.toml