Agentwork

A multi-agent prompt system for software development. Specialized agents with explicit output contracts, structured handoffs, and session memory — orchestrated by the user, not by automation.

Status: Pre-release. Tested locally, not yet public.

What this is

A collection of agent prompts, skill specifications, coordination protocols, and session lifecycle hooks that together form a structured development workflow. Each specialist agent has a defined scope, an explicit output contract, a memory policy, and handoff rules.

The user is the orchestrator. You activate an agent, give it a task, read its output, and decide what happens next. There is no autonomous runtime, no inter-agent messaging, and no middleware. The system is the prompts and the LLM that interprets them.

Used with LLM-based chat environments that support agent/persona switching. Tested with the Kiro CLI; adaptable to other environments.

Why this exists

Most multi-agent setups are either too loose (a bag of prompts with no contracts) or too rigid (a framework that assumes full automation). This suite sits in between:

Specialist agents have explicit output contracts — section names, order, and required fields are defined in the prompt itself, not in external docs the LLM may never see.
Every specialist output ends with a standardized self-review — quality score with evidence, error classification, and a suggested next step.
Handoffs are user-mediated — the suite doesn't pretend agents can reliably coordinate without human judgment.
Session memory is protocol-based — bash hooks handle the infrastructure (project detection, checkpoints, session index), while the agent follows a protocol for summaries and memory writes. The protocol is documented; enforcement is not automated.

Manifesto

Most "multi-agent" setups fall into two traps. Prompt collections with no structure: agents that don't know their boundaries, produce inconsistent output, and can't hand off work reliably. Or fully automated frameworks that pretend agents can orchestrate themselves — until they can't, and you have no way to intervene.

Agentwork rejects both.

Every agent has a contract — not a suggestion, not a "best practice," but a numbered list of sections it must produce every time. Every handoff goes through the user — because the user understands context that no routing algorithm can. Every session has a protocol — not enforced by middleware, but documented, testable, and auditable.

The result is a system you can actually trust to behave consistently, because you can verify that it did.

Typical usage

1. Start a session          → bootstrap.sh runs, prints project context
2. Choose an agent          → /agent swap kiro-builder (or ctrl+N)
3. Give it a task           → "Add a healthcheck endpoint"
4. Read the output          → structured delivery with self-review
5. Decide next step         → pass to kiro-qa, or stop
6. End the session          → stop.sh runs, agent writes summary

The suite defines the contracts and protocols. The user drives the workflow.

Agents

Agent	Level	Role
kiro-strategist	L0 — Strategy	Product/technical strategy, prioritization
kiro-architect	L1 — Design	System architecture, ADRs, technology decisions
kiro-builder	L2 — Execution	Production code implementation
kiro-qa	L2 — Gate	Quality review, verdicts
kiro-debugger	L2 — Repair	Bug diagnosis, root cause analysis, minimal fixes
kiro-security-review	L2 — Security	Defensive security review, threat classification
kiro-orchestrator	Router	Task classification, agent recommendation
kiro-memory	Cross-cutting	Knowledge curation, promotion, archival
kiro-docs	Cross-cutting	Documentation creation and maintenance

Specialist agents (all except kiro-orchestrator) have self-sufficient prompts: an LLM can produce a contract-compliant output from the prompt alone, without loading additional files.

Directory Structure

~/.kiro/agent-suite/
├── prompts/              # Agent system prompts
├── skills/               # Full specs + domain skill packages
├── hooks/                # Session lifecycle hooks (bootstrap.sh, stop.sh)
├── lib/                  # Shared shell libraries (project-identity, session-index)
├── memory/               # Memory subsystem policies
├── templates/            # Session briefing and summary schemas
├── examples/             # Example session data
├── tests/                # Test suites + real output fixtures
├── AGENTS.md             # Agent registry, hierarchy, governance
├── ARCHITECTURE.md       # Full architectural description
├── CANONICAL-V1.md       # Version baseline, frozen components
├── coordination.md       # Handoff protocol, anti-overlap rules
├── memory-policy.md      # Per-agent storage rules
├── session-memory-protocol.md  # Session lifecycle protocol
└── workflow-bootstrap.md # Agent selection, session procedures

See ARCHITECTURE.md for the full breakdown.

Running the Tests

All tests are bash scripts with no dependencies beyond jq (used by the session index tests).

# Run all test suites
for t in tests/test-*.sh; do bash "$t"; done

# Run a specific suite
bash tests/test-session-layer.sh

What the tests cover

Suite	What it verifies
`test-session-layer`	Project detection, session index operations, checkpoint lifecycle, orphan detection, edge cases (spaces, corruption)
`test-suite-structure`	Prompt files exist with required structural markers. Template and example files intact. No dev-repo path coupling.
`test-lifecycle`	Full bootstrap → session-index → stop lifecycle in isolated temp repos. Output fields, checkpoint create/delete, history carry-forward.
`test-contracts`	Agent names consistent across AGENTS.md, prompts/, coordination.md, memory-policy.md, and CANONICAL-V1.md. Referenced files exist.
`test-builder-output-contract`	Builder output contract: section presence, order, None values, self-review subfields. Validated against a real fixture.
`test-qa-output-contract`	QA output contract: section presence, order, verdict, self-review subfields. Validated against a real fixture.
`test-debugger-output-contract`	Debugger output contract: section presence, order, escalation, fix-spec variant, self-review subfields. Validated against a real fixture.

Real Fixtures

Some agents have contract validators anchored to real outputs (not synthetic samples). These are stored in tests/fixtures/ and serve as regression anchors.

Fixture	Scenario
`builder-output-real-quicktask.md`	Quick Task: create a test fixtures directory
`qa-output-real-fixture-review.md`	Review of the builder fixture integration
`debugger-output-real-path-resolution.md`	Debug: unresolved relative path in stop.sh output

Full Session Example

For a complete end-to-end walkthrough showing how contracts and handoffs work across agents, see:

Full Session Example — Builder implements an Express endpoint, QA finds a subtle validation bug, debugger diagnoses the root cause and fixes it in 3 lines. Shows contract-compliant outputs and user-mediated handoffs across three agents.

What is solid

Output contracts are explicit. Specialist prompts list their sections by name and number. The LLM doesn't need to guess the output format.
Session infrastructure works. Project detection, stack detection, checkpoint lifecycle, orphan recovery, and session indexing are tested with edge cases.
Structural regression is covered. Prompt drift, missing files, broken cross-references, and naming inconsistencies are caught by automated tests.
Document consistency is enforced. Agent names are verified across governance documents by the contract consistency test.

What is still missing

No runtime enforcement. The suite relies on LLM compliance with prompt instructions. There is no output parser or contract validator at runtime.
No CI pipeline. Tests run manually. No GitHub Actions, no pre-commit hooks.
Not all agents have contract validators. Builder, QA, and debugger are covered. Architect, docs, memory, and strategist have explicit contracts in their prompts but no test suites yet.
Manual validation plan not executed. VALIDATION-TEST-PLAN.md defines structured tests with real agent sessions. None have been completed.
kiro-security-review has no spec file. It has a prompt and memory policy, but no full specification.
kiro-orchestrator is undocumented in governance. It exists as a prompt but is absent from AGENTS.md and coordination.md.

See the "Current Limitations" section of ARCHITECTURE.md for the full list.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentwork

What this is

Why this exists

Manifesto

Typical usage

Agents

Directory Structure

Running the Tests

What the tests cover

Real Fixtures

Full Session Example

What is solid

What is still missing

Related

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
examples		examples
hooks		hooks
lib		lib
memory		memory
prompts		prompts
skills		skills
templates		templates
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
AGENTS.md		AGENTS.md
ARCHITECTURE.md		ARCHITECTURE.md
CANONICAL-V1.md		CANONICAL-V1.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
VALIDATION-TEST-PLAN.md		VALIDATION-TEST-PLAN.md
coordination.md		coordination.md
memory-policy.md		memory-policy.md
session-memory-protocol.md		session-memory-protocol.md
workflow-bootstrap.md		workflow-bootstrap.md

Folders and files

Latest commit

History

Repository files navigation

Agentwork

What this is

Why this exists

Manifesto

Typical usage

Agents

Directory Structure

Running the Tests

What the tests cover

Real Fixtures

Full Session Example

What is solid

What is still missing

Related

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages