Skip to content

Auriti-Labs/agentwork

Repository files navigation

Agentwork

A multi-agent prompt system for software development. Specialized agents with explicit output contracts, structured handoffs, and session memory — orchestrated by the user, not by automation.

Status: Pre-release. Tested locally, not yet public.


What this is

A collection of agent prompts, skill specifications, coordination protocols, and session lifecycle hooks that together form a structured development workflow. Each specialist agent has a defined scope, an explicit output contract, a memory policy, and handoff rules.

The user is the orchestrator. You activate an agent, give it a task, read its output, and decide what happens next. There is no autonomous runtime, no inter-agent messaging, and no middleware. The system is the prompts and the LLM that interprets them.

Used with LLM-based chat environments that support agent/persona switching. Tested with the Kiro CLI; adaptable to other environments.

Why this exists

Most multi-agent setups are either too loose (a bag of prompts with no contracts) or too rigid (a framework that assumes full automation). This suite sits in between:

  • Specialist agents have explicit output contracts — section names, order, and required fields are defined in the prompt itself, not in external docs the LLM may never see.
  • Every specialist output ends with a standardized self-review — quality score with evidence, error classification, and a suggested next step.
  • Handoffs are user-mediated — the suite doesn't pretend agents can reliably coordinate without human judgment.
  • Session memory is protocol-based — bash hooks handle the infrastructure (project detection, checkpoints, session index), while the agent follows a protocol for summaries and memory writes. The protocol is documented; enforcement is not automated.

Manifesto

Most "multi-agent" setups fall into two traps. Prompt collections with no structure: agents that don't know their boundaries, produce inconsistent output, and can't hand off work reliably. Or fully automated frameworks that pretend agents can orchestrate themselves — until they can't, and you have no way to intervene.

Agentwork rejects both.

Every agent has a contract — not a suggestion, not a "best practice," but a numbered list of sections it must produce every time. Every handoff goes through the user — because the user understands context that no routing algorithm can. Every session has a protocol — not enforced by middleware, but documented, testable, and auditable.

The result is a system you can actually trust to behave consistently, because you can verify that it did.

Typical usage

1. Start a session          → bootstrap.sh runs, prints project context
2. Choose an agent          → /agent swap kiro-builder (or ctrl+N)
3. Give it a task           → "Add a healthcheck endpoint"
4. Read the output          → structured delivery with self-review
5. Decide next step         → pass to kiro-qa, or stop
6. End the session          → stop.sh runs, agent writes summary

The suite defines the contracts and protocols. The user drives the workflow.


Agents

Agent Level Role
kiro-strategist L0 — Strategy Product/technical strategy, prioritization
kiro-architect L1 — Design System architecture, ADRs, technology decisions
kiro-builder L2 — Execution Production code implementation
kiro-qa L2 — Gate Quality review, verdicts
kiro-debugger L2 — Repair Bug diagnosis, root cause analysis, minimal fixes
kiro-security-review L2 — Security Defensive security review, threat classification
kiro-orchestrator Router Task classification, agent recommendation
kiro-memory Cross-cutting Knowledge curation, promotion, archival
kiro-docs Cross-cutting Documentation creation and maintenance

Specialist agents (all except kiro-orchestrator) have self-sufficient prompts: an LLM can produce a contract-compliant output from the prompt alone, without loading additional files.


Directory Structure

~/.kiro/agent-suite/
├── prompts/              # Agent system prompts
├── skills/               # Full specs + domain skill packages
├── hooks/                # Session lifecycle hooks (bootstrap.sh, stop.sh)
├── lib/                  # Shared shell libraries (project-identity, session-index)
├── memory/               # Memory subsystem policies
├── templates/            # Session briefing and summary schemas
├── examples/             # Example session data
├── tests/                # Test suites + real output fixtures
├── AGENTS.md             # Agent registry, hierarchy, governance
├── ARCHITECTURE.md       # Full architectural description
├── CANONICAL-V1.md       # Version baseline, frozen components
├── coordination.md       # Handoff protocol, anti-overlap rules
├── memory-policy.md      # Per-agent storage rules
├── session-memory-protocol.md  # Session lifecycle protocol
└── workflow-bootstrap.md # Agent selection, session procedures

See ARCHITECTURE.md for the full breakdown.


Running the Tests

All tests are bash scripts with no dependencies beyond jq (used by the session index tests).

# Run all test suites
for t in tests/test-*.sh; do bash "$t"; done

# Run a specific suite
bash tests/test-session-layer.sh

What the tests cover

Suite What it verifies
test-session-layer Project detection, session index operations, checkpoint lifecycle, orphan detection, edge cases (spaces, corruption)
test-suite-structure Prompt files exist with required structural markers. Template and example files intact. No dev-repo path coupling.
test-lifecycle Full bootstrap → session-index → stop lifecycle in isolated temp repos. Output fields, checkpoint create/delete, history carry-forward.
test-contracts Agent names consistent across AGENTS.md, prompts/, coordination.md, memory-policy.md, and CANONICAL-V1.md. Referenced files exist.
test-builder-output-contract Builder output contract: section presence, order, None values, self-review subfields. Validated against a real fixture.
test-qa-output-contract QA output contract: section presence, order, verdict, self-review subfields. Validated against a real fixture.
test-debugger-output-contract Debugger output contract: section presence, order, escalation, fix-spec variant, self-review subfields. Validated against a real fixture.

Real Fixtures

Some agents have contract validators anchored to real outputs (not synthetic samples). These are stored in tests/fixtures/ and serve as regression anchors.

Fixture Scenario
builder-output-real-quicktask.md Quick Task: create a test fixtures directory
qa-output-real-fixture-review.md Review of the builder fixture integration
debugger-output-real-path-resolution.md Debug: unresolved relative path in stop.sh output

Full Session Example

For a complete end-to-end walkthrough showing how contracts and handoffs work across agents, see:

Full Session Example — Builder implements an Express endpoint, QA finds a subtle validation bug, debugger diagnoses the root cause and fixes it in 3 lines. Shows contract-compliant outputs and user-mediated handoffs across three agents.


What is solid

  • Output contracts are explicit. Specialist prompts list their sections by name and number. The LLM doesn't need to guess the output format.
  • Session infrastructure works. Project detection, stack detection, checkpoint lifecycle, orphan recovery, and session indexing are tested with edge cases.
  • Structural regression is covered. Prompt drift, missing files, broken cross-references, and naming inconsistencies are caught by automated tests.
  • Document consistency is enforced. Agent names are verified across governance documents by the contract consistency test.

What is still missing

  • No runtime enforcement. The suite relies on LLM compliance with prompt instructions. There is no output parser or contract validator at runtime.
  • No CI pipeline. Tests run manually. No GitHub Actions, no pre-commit hooks.
  • Not all agents have contract validators. Builder, QA, and debugger are covered. Architect, docs, memory, and strategist have explicit contracts in their prompts but no test suites yet.
  • Manual validation plan not executed. VALIDATION-TEST-PLAN.md defines structured tests with real agent sessions. None have been completed.
  • kiro-security-review has no spec file. It has a prompt and memory policy, but no full specification.
  • kiro-orchestrator is undocumented in governance. It exists as a prompt but is absent from AGENTS.md and coordination.md.

See the "Current Limitations" section of ARCHITECTURE.md for the full list.


Related

  • ARCHITECTURE.md — Full architectural description, testing surface, limitations, and roadmap
  • CANONICAL-V1.md — Version baseline, frozen components, compatibility rules
  • AGENTS.md — Agent registry, hierarchy, governance rules

License

MIT

About

Multi-agent prompt system for software development with explicit output contracts, user-mediated orchestration, and testable workflows.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages