An operating system for coding agents — one disciplined, verifiable loop.
Frame → Plan → Build → Review → QA → Secure → Ship → Learn
Built for Claude Code; portable to any skill-aware agent.
Coding agents are powerful but undisciplined. Left alone they skip review, invent a plan halfway through, lose the thread across sessions, and announce "done" with nothing to back it up.
SuperStack is an opinionated framework that turns your agent into a disciplined engineering team. It runs one mandatory, gated loop on every non-trivial task, records what it actually did, and can prove the loop ran before anything ships.
It is its own framework — its own loop, its own gates, its own proof-of-process ledger and self-evolution, its own code. It's informed by years of great open-source thinking on agent workflows (see Credits), but everything in the /ss-* toolkit is SuperStack's own design.
- One gated loop, not a grab-bag of commands. Eight phases, each with a gate it must clear before the next begins. The skills are mandatory workflows, not polite suggestions — so quality isn't left to whether the agent "felt like" reviewing.
- Proof of process. Most agent setups can't tell you whether the agent actually followed the process. SuperStack records every gate to a Loop Ledger,
/ss-auditverifies the mandatory phases ran, and/ss-shipattaches aFramed ✓ Planned ✓ Built ✓ Reviewed ✓ Secured ✓attestation to the PR. Trust, but verify. - It improves itself.
/ss-evolvemines your own usage — recurring skips, gates that keep failing — and auto-applies low-risk fixes (revertable commits) or drafts new skills for your review. The framework gets sharper the more you use it. - Always-on guardrails. Karpathy's four anti-mistake laws run on every task: think before coding, simplicity first, surgical changes, goal-driven execution.
- Context-rot resistant. Fresh-context subagents plus durable
STATE.md/CONTEXT.mdkeep output quality high on long, multi-session work — a cold session can pick up exactly where you left off. - Autonomy when you want it.
/ss-ralphruns the entire loop unattended against a PRD until it's done. - Plays nice with everything. Every command is namespaced
/ss-*and works across Claude Code, Codex, Cursor, OpenCode, Factory, and Kiro — it coexists with whatever you already run, no collisions.
SuperStack stands on the shoulders of excellent frameworks (see Credits) — it makes a different bet than most: one enforced, verifiable loop, not a library of commands you assemble yourself. Here's how it relates to tools you may already know:
| Project | What it is | What SuperStack does differently |
|---|---|---|
| Superpowers (obra) | A full methodology built on composable skills that self-sequence — the agent reaches for brainstorming → plans → TDD → subagent-driven dev as it recognizes each phase. | Runs the same disciplined instincts as one enforced, ordered loop — a gate per stage and a proof-of-process ledger (/ss-audit) you can verify, plus /ss-evolve self-improvement. A different shape, same DNA. |
| GSD (Get Shit Done) | Phase-based context engineering — discuss → plan → execute, each in a fresh sub-agent window, with a large specialist-agent roster. | Keeps one durable STATE.md / CONTEXT.md across every stage, and adds a verifiable ledger + /ss-evolve that learns from your own usage. |
| gstack (Garry Tan) | A role-based "software factory" — 20+ specialist slash commands (CEO, designer, QA, security, release) you call à la carte, with real-browser QA. | Is a single spine you walk in order, gated and logged. It composes with gstack — run those specialists alongside the loop (the /ss-* namespace never collides). |
| Ralph (Geoffrey Huntley) | The bare autonomous-loop technique — a fresh agent each iteration against a PRD, with tests/lints as "backpressure." | /ss-ralph wraps that engine in named stages, a ledger, and durable context — Ralph is the engine, SuperStack the instrumented track. |
SuperStack's specific bet: a coherent, verifiable, self-improving loop — proof the process actually ran (ledger + /ss-audit), memory that survives sessions, and a framework that sharpens itself from your usage (/ss-evolve). If one of the above fits your need better, use it — they're built to coexist.
- Ship a feature you can trust — the full loop: spec → TDD → review → QA → security, with the proof attached to the PR.
- Fix a bug the right way — start at QA: reproduce → fix → add a regression test, so it can't silently come back.
- Refactor without fear — Plan → Build with the test suite green before and after.
- Long or multi-session work — context engineering + the ledger keep a brand-new session resumable and on-track.
- Unattended grind — point
/ss-ralphat a PRD and let it work through the backlog with real feedback loops (typecheck, tests, CI). - Process you can audit — every change carries a verifiable record of how it was built, not just what changed.
- Fewer "looked done, actually broke" moments — gates and evidence replace optimistic claims.
- Less context rot on big tasks — the heavy lifting happens in fresh subagent contexts.
- A process that tightens itself over time, from your real usage.
- No lock-in — MIT licensed, cross-agent, namespaced. Adopt it incrementally; fork it freely.
┌───────────────────── context engineering ─────────────────────┐
│ fresh-context subagents · STATE.md · CONTEXT.md │
└─────────────────────────────────────────────────────────────────┘
FRAME ──▶ PLAN ──▶ BUILD ──▶ REVIEW ──▶ QA ──▶ SECURE ──▶ SHIP ──▶ LEARN
spec tasks TDD bugs app OWASP PR memory
▲ │
└────────── /ss-ralph (autonomous) ───┘
every phase records its gate to the Loop Ledger → /ss-audit verifies it before /ss-ship
Each phase has a gate it must clear before the next begins. Re-enter anywhere: a bug report starts at QA, a refactor at Plan, "what should we build?" at Frame. Trivial one-liners skip the ceremony — the loop scales to the work.
/plugin marketplace add Mrshahidali420/superstack
/plugin install superstack@superstack
# macOS / Linux
git clone https://github.com/Mrshahidali420/superstack ~/.superstack && ~/.superstack/install.sh# Windows
git clone https://github.com/Mrshahidali420/superstack "$HOME\.superstack"; & "$HOME\.superstack\install.ps1"Installs the /ss-* skills and agents into ~/.claude/ by default. Pass --host codex|cursor|opencode|factory|kiro (or -Agent on Windows), or --all, to target other agents. Then merge CLAUDE.md into your global or project config to adopt the loop.
Then run /ss-init once in your project to set up .superstack/, and /ss-frame to start the loop.
The loop
| Command | Phase | Does |
|---|---|---|
/ss-frame |
Frame | Interrogate intent, push back, write a spec you sign off on |
/ss-plan |
Plan | Break the spec into small, individually verifiable tasks |
/ss-build |
Build | TDD execution, one task per fresh subagent |
/ss-review |
Review | Staff-eng review, severity-graded, auto-fix the trivial |
/ss-qa |
QA | Run the app, find and fix bugs, add regression tests |
/ss-secure |
Secure | OWASP + STRIDE pass + secret scan |
/ss-ship |
Ship | Coverage gate, conventional commit, PR, attestation, optional deploy |
/ss-learn |
Learn | Persist learnings so the next session starts smart |
Proof, autonomy & insight
| Command | Does |
|---|---|
/ss-audit |
Verify the mandatory phases actually ran (reads the Loop Ledger) |
/ss-report |
Generate a shareable Markdown summary of how a change was built |
/ss-replay |
Replay a run as a chronological timeline (the story leg); --save for a shareable Markdown file |
/ss-evolve |
Learn from your ledger; auto-apply low-risk fixes, draft new skills for review. Now supports --since <window> (time-windowed detection) and --explore (deterministic draft-skill proposals into .superstack/proposals/, never auto-committed). |
/ss-ralph |
Run the loop unattended until a PRD is fully done |
Supporting skills: /ss-debug /ss-guard /ss-respond /ss-worktree /ss-pause /ss-resume /ss-retro /ss-docs /ss-init /ss-doctor /ss-drift /ss-stats /ss-trace /ss-context /ss-ctx — run /ss-help for the full index (30 skills, 4 subagents, 2 hooks, 1 MCP server).
The latest additions (full detail in the Changelog):
- Context engineering, built in — a new "context all-rounder" that keeps the agent's window lean:
/ss-context— a standing-context budget cockpit. It measures your always-loaded footprint (CLAUDE.md, skill descriptions,STATE.md/CONTEXT.md) against a budget and warns you before you blow it — automatically, at session start.ss-ctxruntime-output sandbox — aPostToolUsehook that transparently shrinks oversized command output (saving the full text to a retrievable store), plus a dependency-free MCP server exposingctx_execute/ctx_batch_execute/ctx_search/ctx_show/ctx_fetch_and_index, so verbose command runs and fetched pages never flood the context window.
- Cross-run insight —
/ss-stats(DORA-style analytics across runs: gate-fail rate, skips, trend) and/ss-trace(a change's full provenance — spec → ledger → commits — in one chronological lineage). - Earlier —
/ss-drift(file-drift detection),/ss-doctor(health checks),/ss-init(project setup),/ss-replay(run timelines).
SuperStack is built with its own loop — every feature is framed, planned, reviewed, and proven before it ships.
Shipped
- The eight-phase gated loop + Loop Ledger +
/ss-auditproof-of-process. - Self-evolution (
/ss-evolve) and unattended autonomy (/ss-ralph). - Insight & provenance:
/ss-report,/ss-replay,/ss-stats,/ss-trace. - The context all-rounder, Fronts 1–2: the
/ss-contextcockpit and thess-ctxruntime-output sandbox (hook + MCP server).
Next
- Front 3 —
ss-munch: symbol-level code retrieval (a tree-sitter AST index) so the agent reads the function it needs, not the whole file. - Front 4 — integration: a routing doctrine that wires the context tools into the loop, with the cockpit reporting the whole stack.
/ss-panel: a unified dashboard over the ledger (report + replay + trace in one view).
Direction: go deeper on the two things that make long, autonomous agent work trustworthy — verifiable process and context engineering. Ideas and PRs welcome.
- Loop Ledger +
/ss-audit— every phase records its gate to.superstack/ledger.jsonl; the audit checks the mandatory phases (defaultreview,secure) each passed or carry an explicit skip-with-reason. An opt-inPreToolUsehook (SUPERSTACK_AUDIT=1) can block a push when the loop is incomplete. Seedocs/ledger.md. /ss-report— turns the ledger + git into a copy-pasteable run summary (phases, timing, change size) for a PR or status update. Read-only./ss-evolve— detects recurring patterns in the ledger and auto-applies low-riskCONTEXT.mdinsights as revertablechore(evolve):commits, routing brand-new skill drafts to.superstack/proposals/for your review (never auto-committed).- Hooks — a SessionStart hook activates the loop from the first message (and after
/clear/ compaction); an opt-in guard (SUPERSTACK_GUARD=1,SUPERSTACK_FREEZE_DIR=<dir>) blocks destructive commands / edits outside a directory. Stack-specific format/lint/test hooks are intentionally not bundled — seedocs/hooks.md. - Autonomy —
/ss-ralphconverts a spec to aprd.jsonand runs a fresh agent per iteration, with--dry-run, per-iteration logs, and archive-on-completion.
Everything ships bash + PowerShell twins and is covered by a self-test (tests/run.sh) and CI.
- Think before coding — surface assumptions and alternatives; ask when unclear.
- Simplicity first — minimum code, nothing speculative.
- Surgical changes — touch only what the request requires.
- Goal-driven execution — turn tasks into verifiable goals and loop until they pass.
Full operating system: CLAUDE.md · design notes: docs/workflow.md.
You: Build me a URL-shortener API.
/ss-frame Pushes back — "single-user or multi-tenant? custom slugs?"
→ writes specs/url-shortener.md; you approve.
/ss-plan → 4 tasks, each with its own test, in PLAN.md
/ss-build → TDD per task: failing test → minimal handler → green
/ss-review → flags a missing slug-collision check; auto-fixes it
/ss-qa → hits the running API, catches a 500 on duplicate slug,
fixes it, adds a regression test
/ss-secure → confirms input validation, no secrets in the diff
/ss-ship → conventional commit, PR opened with a process attestation, CI green
/ss-report → "built in 22m · 4 phases · 12 tests added · 2 bugs caught at review"
SuperStack is deliberately lean. It does not bundle a headless-browser server, a full standalone CLI, or an eval harness. Where you need that depth, the /ss-* namespace is chosen so you can run a specialized tool alongside SuperStack without collision — use the right tool for the job, keep the loop as your spine.
SuperStack is original work, but it stands on the shoulders of excellent MIT-licensed projects that shaped how the community thinks about agent workflows — and on Andrej Karpathy's notes on LLM coding pitfalls. See CREDITS.md for the full acknowledgment. If any of them fit your needs better, use them — and if you want a verifiable, self-improving loop as your backbone, that's what SuperStack is here for.
MIT licensed. Fork it, make it yours. Built by @Mrshahidali420.