A multi-agent prompt system for software development. Specialized agents with explicit output contracts, structured handoffs, and session memory — orchestrated by the user, not by automation.
Status: Pre-release. Tested locally, not yet public.
A collection of agent prompts, skill specifications, coordination protocols, and session lifecycle hooks that together form a structured development workflow. Each specialist agent has a defined scope, an explicit output contract, a memory policy, and handoff rules.
The user is the orchestrator. You activate an agent, give it a task, read its output, and decide what happens next. There is no autonomous runtime, no inter-agent messaging, and no middleware. The system is the prompts and the LLM that interprets them.
Used with LLM-based chat environments that support agent/persona switching. Tested with the Kiro CLI; adaptable to other environments.
Most multi-agent setups are either too loose (a bag of prompts with no contracts) or too rigid (a framework that assumes full automation). This suite sits in between:
- Specialist agents have explicit output contracts — section names, order, and required fields are defined in the prompt itself, not in external docs the LLM may never see.
- Every specialist output ends with a standardized self-review — quality score with evidence, error classification, and a suggested next step.
- Handoffs are user-mediated — the suite doesn't pretend agents can reliably coordinate without human judgment.
- Session memory is protocol-based — bash hooks handle the infrastructure (project detection, checkpoints, session index), while the agent follows a protocol for summaries and memory writes. The protocol is documented; enforcement is not automated.
Most "multi-agent" setups fall into two traps. Prompt collections with no structure: agents that don't know their boundaries, produce inconsistent output, and can't hand off work reliably. Or fully automated frameworks that pretend agents can orchestrate themselves — until they can't, and you have no way to intervene.
Agentwork rejects both.
Every agent has a contract — not a suggestion, not a "best practice," but a numbered list of sections it must produce every time. Every handoff goes through the user — because the user understands context that no routing algorithm can. Every session has a protocol — not enforced by middleware, but documented, testable, and auditable.
The result is a system you can actually trust to behave consistently, because you can verify that it did.
1. Start a session → bootstrap.sh runs, prints project context
2. Choose an agent → /agent swap kiro-builder (or ctrl+N)
3. Give it a task → "Add a healthcheck endpoint"
4. Read the output → structured delivery with self-review
5. Decide next step → pass to kiro-qa, or stop
6. End the session → stop.sh runs, agent writes summary
The suite defines the contracts and protocols. The user drives the workflow.
| Agent | Level | Role |
|---|---|---|
| kiro-strategist | L0 — Strategy | Product/technical strategy, prioritization |
| kiro-architect | L1 — Design | System architecture, ADRs, technology decisions |
| kiro-builder | L2 — Execution | Production code implementation |
| kiro-qa | L2 — Gate | Quality review, verdicts |
| kiro-debugger | L2 — Repair | Bug diagnosis, root cause analysis, minimal fixes |
| kiro-security-review | L2 — Security | Defensive security review, threat classification |
| kiro-orchestrator | Router | Task classification, agent recommendation |
| kiro-memory | Cross-cutting | Knowledge curation, promotion, archival |
| kiro-docs | Cross-cutting | Documentation creation and maintenance |
Specialist agents (all except kiro-orchestrator) have self-sufficient prompts: an LLM can produce a contract-compliant output from the prompt alone, without loading additional files.
~/.kiro/agent-suite/
├── prompts/ # Agent system prompts
├── skills/ # Full specs + domain skill packages
├── hooks/ # Session lifecycle hooks (bootstrap.sh, stop.sh)
├── lib/ # Shared shell libraries (project-identity, session-index)
├── memory/ # Memory subsystem policies
├── templates/ # Session briefing and summary schemas
├── examples/ # Example session data
├── tests/ # Test suites + real output fixtures
├── AGENTS.md # Agent registry, hierarchy, governance
├── ARCHITECTURE.md # Full architectural description
├── CANONICAL-V1.md # Version baseline, frozen components
├── coordination.md # Handoff protocol, anti-overlap rules
├── memory-policy.md # Per-agent storage rules
├── session-memory-protocol.md # Session lifecycle protocol
└── workflow-bootstrap.md # Agent selection, session procedures
See ARCHITECTURE.md for the full breakdown.
All tests are bash scripts with no dependencies beyond jq (used by the session index tests).
# Run all test suites
for t in tests/test-*.sh; do bash "$t"; done
# Run a specific suite
bash tests/test-session-layer.sh| Suite | What it verifies |
|---|---|
test-session-layer |
Project detection, session index operations, checkpoint lifecycle, orphan detection, edge cases (spaces, corruption) |
test-suite-structure |
Prompt files exist with required structural markers. Template and example files intact. No dev-repo path coupling. |
test-lifecycle |
Full bootstrap → session-index → stop lifecycle in isolated temp repos. Output fields, checkpoint create/delete, history carry-forward. |
test-contracts |
Agent names consistent across AGENTS.md, prompts/, coordination.md, memory-policy.md, and CANONICAL-V1.md. Referenced files exist. |
test-builder-output-contract |
Builder output contract: section presence, order, None values, self-review subfields. Validated against a real fixture. |
test-qa-output-contract |
QA output contract: section presence, order, verdict, self-review subfields. Validated against a real fixture. |
test-debugger-output-contract |
Debugger output contract: section presence, order, escalation, fix-spec variant, self-review subfields. Validated against a real fixture. |
Some agents have contract validators anchored to real outputs (not synthetic samples). These are stored in tests/fixtures/ and serve as regression anchors.
| Fixture | Scenario |
|---|---|
builder-output-real-quicktask.md |
Quick Task: create a test fixtures directory |
qa-output-real-fixture-review.md |
Review of the builder fixture integration |
debugger-output-real-path-resolution.md |
Debug: unresolved relative path in stop.sh output |
For a complete end-to-end walkthrough showing how contracts and handoffs work across agents, see:
Full Session Example — Builder implements an Express endpoint, QA finds a subtle validation bug, debugger diagnoses the root cause and fixes it in 3 lines. Shows contract-compliant outputs and user-mediated handoffs across three agents.
- Output contracts are explicit. Specialist prompts list their sections by name and number. The LLM doesn't need to guess the output format.
- Session infrastructure works. Project detection, stack detection, checkpoint lifecycle, orphan recovery, and session indexing are tested with edge cases.
- Structural regression is covered. Prompt drift, missing files, broken cross-references, and naming inconsistencies are caught by automated tests.
- Document consistency is enforced. Agent names are verified across governance documents by the contract consistency test.
- No runtime enforcement. The suite relies on LLM compliance with prompt instructions. There is no output parser or contract validator at runtime.
- No CI pipeline. Tests run manually. No GitHub Actions, no pre-commit hooks.
- Not all agents have contract validators. Builder, QA, and debugger are covered. Architect, docs, memory, and strategist have explicit contracts in their prompts but no test suites yet.
- Manual validation plan not executed.
VALIDATION-TEST-PLAN.mddefines structured tests with real agent sessions. None have been completed. - kiro-security-review has no spec file. It has a prompt and memory policy, but no full specification.
- kiro-orchestrator is undocumented in governance. It exists as a prompt but is absent from AGENTS.md and coordination.md.
See the "Current Limitations" section of ARCHITECTURE.md for the full list.
- ARCHITECTURE.md — Full architectural description, testing surface, limitations, and roadmap
- CANONICAL-V1.md — Version baseline, frozen components, compatibility rules
- AGENTS.md — Agent registry, hierarchy, governance rules