SuperStack

An operating system for coding agents — one disciplined, verifiable loop.

Frame → Plan → Build → Review → QA → Secure → Ship → Learn

Built for Claude Code; portable to any skill-aware agent.

What is SuperStack?

Coding agents are powerful but undisciplined. Left alone they skip review, invent a plan halfway through, lose the thread across sessions, and announce "done" with nothing to back it up.

SuperStack is an opinionated framework that turns your agent into a disciplined engineering team. It runs one mandatory, gated loop on every non-trivial task, records what it actually did, and can prove the loop ran before anything ships.

It is its own framework — its own loop, its own gates, its own proof-of-process ledger and self-evolution, its own code. It's informed by years of great open-source thinking on agent workflows (see Credits), but everything in the /ss-* toolkit is SuperStack's own design.

Why SuperStack

One gated loop, not a grab-bag of commands. Eight phases, each with a gate it must clear before the next begins. The skills are mandatory workflows, not polite suggestions — so quality isn't left to whether the agent "felt like" reviewing.
Proof of process. Most agent setups can't tell you whether the agent actually followed the process. SuperStack records every gate to a Loop Ledger, /ss-audit verifies the mandatory phases ran, and /ss-ship attaches a Framed ✓ Planned ✓ Built ✓ Reviewed ✓ Secured ✓ attestation to the PR. Trust, but verify.
It improves itself. /ss-evolve mines your own usage — recurring skips, gates that keep failing — and auto-applies low-risk fixes (revertable commits) or drafts new skills for your review. The framework gets sharper the more you use it.
Always-on guardrails. Karpathy's four anti-mistake laws run on every task: think before coding, simplicity first, surgical changes, goal-driven execution.
Context-rot resistant. Fresh-context subagents plus durable STATE.md / CONTEXT.md keep output quality high on long, multi-session work — a cold session can pick up exactly where you left off.
Autonomy when you want it. /ss-ralph runs the entire loop unattended against a PRD until it's done.
Plays nice with everything. Every command is namespaced /ss-* and works across Claude Code, Codex, Cursor, OpenCode, Factory, and Kiro — it coexists with whatever you already run, no collisions.

How it compares

SuperStack stands on the shoulders of excellent frameworks (see Credits) — it makes a different bet than most: one enforced, verifiable loop, not a library of commands you assemble yourself. Here's how it relates to tools you may already know:

Project	What it is	What SuperStack does differently
Superpowers (obra)	A full methodology built on composable skills that self-sequence — the agent reaches for brainstorming → plans → TDD → subagent-driven dev as it recognizes each phase.	Runs the same disciplined instincts as one enforced, ordered loop — a gate per stage and a proof-of-process ledger (`/ss-audit`) you can verify, plus `/ss-evolve` self-improvement. A different shape, same DNA.
GSD (Get Shit Done)	Phase-based context engineering — discuss → plan → execute, each in a fresh sub-agent window, with a large specialist-agent roster.	Keeps one durable `STATE.md` / `CONTEXT.md` across every stage, and adds a verifiable ledger + `/ss-evolve` that learns from your own usage.
gstack (Garry Tan)	A role-based "software factory" — 20+ specialist slash commands (CEO, designer, QA, security, release) you call à la carte, with real-browser QA.	Is a single spine you walk in order, gated and logged. It composes with gstack — run those specialists alongside the loop (the `/ss-*` namespace never collides).
Ralph (Geoffrey Huntley)	The bare autonomous-loop technique — a fresh agent each iteration against a PRD, with tests/lints as "backpressure."	`/ss-ralph` wraps that engine in named stages, a ledger, and durable context — Ralph is the engine, SuperStack the instrumented track.

SuperStack's specific bet: a coherent, verifiable, self-improving loop — proof the process actually ran (ledger + /ss-audit), memory that survives sessions, and a framework that sharpens itself from your usage (/ss-evolve). If one of the above fits your need better, use it — they're built to coexist.

Use cases

Ship a feature you can trust — the full loop: spec → TDD → review → QA → security, with the proof attached to the PR.
Fix a bug the right way — start at QA: reproduce → fix → add a regression test, so it can't silently come back.
Refactor without fear — Plan → Build with the test suite green before and after.
Long or multi-session work — context engineering + the ledger keep a brand-new session resumable and on-track.
Unattended grind — point /ss-ralph at a PRD and let it work through the backlog with real feedback loops (typecheck, tests, CI).
Process you can audit — every change carries a verifiable record of how it was built, not just what changed.

Benefits

Fewer "looked done, actually broke" moments — gates and evidence replace optimistic claims.
Less context rot on big tasks — the heavy lifting happens in fresh subagent contexts.
A process that tightens itself over time, from your real usage.
No lock-in — MIT licensed, cross-agent, namespaced. Adopt it incrementally; fork it freely.

The loop

        ┌─────────────────────  context engineering  ─────────────────────┐
        │     fresh-context subagents  ·  STATE.md  ·  CONTEXT.md          │
        └─────────────────────────────────────────────────────────────────┘

   FRAME ──▶ PLAN ──▶ BUILD ──▶ REVIEW ──▶ QA ──▶ SECURE ──▶ SHIP ──▶ LEARN
    spec     tasks     TDD       bugs      app    OWASP       PR       memory
                         ▲                                      │
                         └──────────  /ss-ralph (autonomous) ───┘

   every phase records its gate to the Loop Ledger → /ss-audit verifies it before /ss-ship

Each phase has a gate it must clear before the next begins. Re-enter anywhere: a bug report starts at QA, a refactor at Plan, "what should we build?" at Frame. Trivial one-liners skip the ceremony — the loop scales to the work.

Install

Claude Code (recommended)

/plugin marketplace add Mrshahidali420/superstack
/plugin install superstack@superstack

Manual (any agent, or to merge the CLAUDE.md)

# macOS / Linux
git clone https://github.com/Mrshahidali420/superstack ~/.superstack && ~/.superstack/install.sh

# Windows
git clone https://github.com/Mrshahidali420/superstack "$HOME\.superstack"; & "$HOME\.superstack\install.ps1"

Installs the /ss-* skills and agents into ~/.claude/ by default. Pass --host codex|cursor|opencode|factory|kiro (or -Agent on Windows), or --all, to target other agents. Then merge CLAUDE.md into your global or project config to adopt the loop.

Then run /ss-init once in your project to set up .superstack/, and /ss-frame to start the loop.

Commands

The loop

Command	Phase	Does
`/ss-frame`	Frame	Interrogate intent, push back, write a spec you sign off on
`/ss-plan`	Plan	Break the spec into small, individually verifiable tasks
`/ss-build`	Build	TDD execution, one task per fresh subagent
`/ss-review`	Review	Staff-eng review, severity-graded, auto-fix the trivial
`/ss-qa`	QA	Run the app, find and fix bugs, add regression tests
`/ss-secure`	Secure	OWASP + STRIDE pass + secret scan
`/ss-ship`	Ship	Coverage gate, conventional commit, PR, attestation, optional deploy
`/ss-learn`	Learn	Persist learnings so the next session starts smart

Proof, autonomy & insight

Command	Does
`/ss-audit`	Verify the mandatory phases actually ran (reads the Loop Ledger)
`/ss-report`	Generate a shareable Markdown summary of how a change was built
`/ss-replay`	Replay a run as a chronological timeline (the story leg); `--save` for a shareable Markdown file
`/ss-evolve`	Learn from your ledger; auto-apply low-risk fixes, draft new skills for review. Now supports `--since <window>` (time-windowed detection) and `--explore` (deterministic draft-skill proposals into `.superstack/proposals/`, never auto-committed).
`/ss-ralph`	Run the loop unattended until a PRD is fully done

Supporting skills: /ss-debug /ss-guard /ss-respond /ss-worktree /ss-pause /ss-resume /ss-retro /ss-docs /ss-init /ss-doctor /ss-drift /ss-stats /ss-trace /ss-context /ss-ctx — run /ss-help for the full index (30 skills, 4 subagents, 2 hooks, 1 MCP server).

What's new

The latest additions (full detail in the Changelog):

Context engineering, built in — a new "context all-rounder" that keeps the agent's window lean:
- /ss-context — a standing-context budget cockpit. It measures your always-loaded footprint (CLAUDE.md, skill descriptions, STATE.md / CONTEXT.md) against a budget and warns you before you blow it — automatically, at session start.
- ss-ctx runtime-output sandbox — a PostToolUse hook that transparently shrinks oversized command output (saving the full text to a retrievable store), plus a dependency-free MCP server exposing ctx_execute / ctx_batch_execute / ctx_search / ctx_show / ctx_fetch_and_index, so verbose command runs and fetched pages never flood the context window.
Cross-run insight — /ss-stats (DORA-style analytics across runs: gate-fail rate, skips, trend) and /ss-trace (a change's full provenance — spec → ledger → commits — in one chronological lineage).
Earlier — /ss-drift (file-drift detection), /ss-doctor (health checks), /ss-init (project setup), /ss-replay (run timelines).

Roadmap

SuperStack is built with its own loop — every feature is framed, planned, reviewed, and proven before it ships.

Shipped

The eight-phase gated loop + Loop Ledger + /ss-audit proof-of-process.
Self-evolution (/ss-evolve) and unattended autonomy (/ss-ralph).
Insight & provenance: /ss-report, /ss-replay, /ss-stats, /ss-trace.
The context all-rounder, Fronts 1–2: the /ss-context cockpit and the ss-ctx runtime-output sandbox (hook + MCP server).

Next

Front 3 — ss-munch: symbol-level code retrieval (a tree-sitter AST index) so the agent reads the function it needs, not the whole file.
Front 4 — integration: a routing doctrine that wires the context tools into the loop, with the cockpit reporting the whole stack.
/ss-panel: a unified dashboard over the ledger (report + replay + trace in one view).

Direction: go deeper on the two things that make long, autonomous agent work trustworthy — verifiable process and context engineering. Ideas and PRs welcome.

Under the hood

Loop Ledger + /ss-audit — every phase records its gate to .superstack/ledger.jsonl; the audit checks the mandatory phases (default review,secure) each passed or carry an explicit skip-with-reason. An opt-in PreToolUse hook (SUPERSTACK_AUDIT=1) can block a push when the loop is incomplete. See docs/ledger.md.
/ss-report — turns the ledger + git into a copy-pasteable run summary (phases, timing, change size) for a PR or status update. Read-only.
/ss-evolve — detects recurring patterns in the ledger and auto-applies low-risk CONTEXT.md insights as revertable chore(evolve): commits, routing brand-new skill drafts to .superstack/proposals/ for your review (never auto-committed).
Hooks — a SessionStart hook activates the loop from the first message (and after /clear / compaction); an opt-in guard (SUPERSTACK_GUARD=1, SUPERSTACK_FREEZE_DIR=<dir>) blocks destructive commands / edits outside a directory. Stack-specific format/lint/test hooks are intentionally not bundled — see docs/hooks.md.
Autonomy — /ss-ralph converts a spec to a prd.json and runs a fresh agent per iteration, with --dry-run, per-iteration logs, and archive-on-completion.

Everything ships bash + PowerShell twins and is covered by a self-test (tests/run.sh) and CI.

Karpathy's four laws (always on)

Think before coding — surface assumptions and alternatives; ask when unclear.
Simplicity first — minimum code, nothing speculative.
Surgical changes — touch only what the request requires.
Goal-driven execution — turn tasks into verifiable goals and loop until they pass.

Full operating system: CLAUDE.md · design notes: docs/workflow.md.

See it work

You:        Build me a URL-shortener API.
/ss-frame   Pushes back — "single-user or multi-tenant? custom slugs?"
            → writes specs/url-shortener.md; you approve.
/ss-plan    → 4 tasks, each with its own test, in PLAN.md
/ss-build   → TDD per task: failing test → minimal handler → green
/ss-review  → flags a missing slug-collision check; auto-fixes it
/ss-qa      → hits the running API, catches a 500 on duplicate slug,
              fixes it, adds a regression test
/ss-secure  → confirms input validation, no secrets in the diff
/ss-ship    → conventional commit, PR opened with a process attestation, CI green
/ss-report  → "built in 22m · 4 phases · 12 tests added · 2 bugs caught at review"

Focused, not bloated

SuperStack is deliberately lean. It does not bundle a headless-browser server, a full standalone CLI, or an eval harness. Where you need that depth, the /ss-* namespace is chosen so you can run a specialized tool alongside SuperStack without collision — use the right tool for the job, keep the loop as your spine.

Credits & inspiration

SuperStack is original work, but it stands on the shoulders of excellent MIT-licensed projects that shaped how the community thinks about agent workflows — and on Andrej Karpathy's notes on LLM coding pitfalls. See CREDITS.md for the full acknowledgment. If any of them fit your needs better, use them — and if you want a verifiable, self-improving loop as your backbone, that's what SuperStack is here for.

Changelog · Releases

MIT licensed. Fork it, make it yours. Built by @Mrshahidali420.

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
.claude-plugin		.claude-plugin
.github/workflows		.github/workflows
agents		agents
docs		docs
hooks		hooks
mcp		mcp
ralph		ralph
scripts		scripts
skills		skills
templates		templates
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.mcp.json		.mcp.json
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
CREDITS.md		CREDITS.md
LICENSE		LICENSE
README.md		README.md
install.ps1		install.ps1
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SuperStack

What is SuperStack?

Why SuperStack

How it compares

Use cases

Benefits

The loop

Install

Claude Code (recommended)

Manual (any agent, or to merge the CLAUDE.md)

Commands

What's new

Roadmap

Under the hood

Karpathy's four laws (always on)

See it work

Focused, not bloated

Credits & inspiration

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SuperStack

What is SuperStack?

Why SuperStack

How it compares

Use cases

Benefits

The loop

Install

Claude Code (recommended)

Manual (any agent, or to merge the CLAUDE.md)

Commands

What's new

Roadmap

Under the hood

Karpathy's four laws (always on)

See it work

Focused, not bloated

Credits & inspiration

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages