unattended harness for coding agents. planner-worker-judge on the command line.
uv tool install git+https://github.com/kronael/shipor from source:
make install # uv tool install + skillrequires claude code CLI, authenticated. codex CLI optional
(only for -x refiner).
install the /ship skill so Claude Code can plan and
execute autonomously:
mkdir -p ~/.claude/skills/ship
cp skill/SKILL.md skill/prompt.md ~/.claude/skills/ship/make install does this automatically. after installing,
use /ship <goal> in Claude Code.
ship <file> # ship from design file
ship specs/ # ship from specs directory
ship "add auth" # inline goal text
ship -f # wipe state and start fresh
ship -k # validate spec only (exit 0/1)
ship -n 8 # 8 workers (default: 4)
ship -t 1200 # 20min timeout per task (default: 2400s)
ship -m 25 # 25 agentic turns per task (default: 50)
ship -p "use stdlib only" # inject override into all LLM calls
ship -v # verbose (show prompts/responses)
ship -x # enable codex refinercontinuation is automatic: if state exists and spec is unchanged,
ship resumes from where it left off. if spec changed, an LLM call
decides whether to keep completed tasks or replan from scratch.
use -f to force a fresh run.
-x enables the codex refiner. without it, ship runs workers +
replan only. with -x, codex critiques completed work and generates
follow-up tasks between cycles.
specs/*.md -> validator -> planner -> workers -> judge -> verifier -> done
- validator checks design quality. rejects to
.ship/REJECTION.mdor writes PROJECT.md. caches spec SHA256 in.ship/validated; subsequent runs skip re-validation if spec unchanged. - planner breaks deliverables into tasks, writes PLAN.md
- workers execute tasks via claude CLI, each in its own session.
streams NDJSON events via
--output-format stream-json, parses<progress>tags for live status, tracks git diff stats per task. parses<summary>from output for TUI. - judge monitors completion, judges each task, triggers refinement cycles. retries failed tasks up to 10 times, then cascades failure to dependent tasks.
- refiner (requires
-x) analyzes results via codex CLI, creates follow-up tasks - replanner runs if refiner finds nothing (or
-xnot set), catches missed work - verifier runs adversarial challenges (up to 3 rounds) to prove the objective is met before marking complete
on spec change (hash differs from saved state): LLM decides whether to keep completed tasks and add new ones, or replan from scratch.
on error: worker resumes session for a progress summary, or falls
back to last <progress> tags seen. if output is missing XML tags,
worker calls claude.reformat() to retry formatting.
ctrl+c kills child processes and exits cleanly (SIGINT/SIGTERM both handled). a lock file prevents concurrent runs on the same state dir.
the /ship Claude Code skill (~/.claude/skills/ship/)
plans a project inside Claude, writes specs/*.md, then calls
ship to execute. use /ship <goal> in Claude Code.
works incrementally: detects existing specs and shipped work, only plans and ships the delta.
ship reads the file or directory passed on the cmdline. each spec file should have deliverables with concrete acceptance criteria:
# Component Name
## Goal
what this component delivers
## Deliverables
### 1. Feature name
- **Files**: src/foo.rs, tests/foo_test.rs
- **Accept**: testable criteria
- **Notes**: patterns to follow
## Constraints
- conventions, boundaries
## Verification
- [ ] how to know it works.ship/ directory: tasks.json, work.json, log/
single .md arg gets its own slug dir: ship foo.md → .ship/foo/.
optional .env in project root:
NUM_WORKERS=4
TASK_TIMEOUT=2400
MAX_TURNS=50
CLI args override env vars override .env file.
make build # uv sync
make test # unit tests (<5s, skips smoke)
make smoke # smoke tests (real CLI calls)
make lint # pre-commit run -a
make right # pyright only
make clean # rm cache + statedev deps (pytest, pyright, ruff, pre-commit) are in [dependency-groups] dev
in pyproject.toml. install with uv sync --group dev.
MIT