From 562017df5d044fe1060a28a429ce8fec39a5c765 Mon Sep 17 00:00:00 2001
From: Christopher Bailey <Chris.Bailey@ERP-Access.com>
Date: Tue, 12 May 2026 19:24:57 -0700
Subject: [PATCH] Add Bailey OS swarm build specification

---
 BAILEY_OS_SWARM_BUILD_SPEC.md | 415 ++++++++++++++++++++++++++++++++++
 1 file changed, 415 insertions(+)
 create mode 100644 BAILEY_OS_SWARM_BUILD_SPEC.md

diff --git a/BAILEY_OS_SWARM_BUILD_SPEC.md b/BAILEY_OS_SWARM_BUILD_SPEC.md
new file mode 100644
index 0000000..5f6b2de
--- /dev/null
+++ b/BAILEY_OS_SWARM_BUILD_SPEC.md
@@ -0,0 +1,415 @@
+# Bailey OS Swarm Build Specification
+
+## 1) Product Definition
+
+Bailey OS is a **local, evidence-first agent operating layer** that provides governed, repeatable workflows over repos, prompts, tools, skills, and memories.
+
+It is **not**:
+- a generic chatbot,
+- an ungoverned autonomous coding swarm,
+- a domain plugin bundle before core runtime guarantees exist.
+
+Primary executable:
+- `bailey`
+
+Initial commands:
+- `bailey init`
+- `bailey status`
+- `bailey scan`
+- `bailey audit-repo`
+- `bailey extract-claims`
+- `bailey validate-claims`
+- `bailey critic`
+- `bailey ralph`
+- `bailey report`
+- `bailey run`
+
+Core doctrine:
+1. Deterministic first, LLM second.
+2. No factual claim ships without evidence or explicit inference label.
+3. Every run produces logs.
+4. Every module has a contract.
+5. Every feature has tests.
+6. Every packet has acceptance criteria.
+7. Ralph routes every artifact decision.
+8. PromptSpeak governs workflows only where parser/schema/runtime exist.
+9. Plugins require explicit permission boundaries and must not mutate global state implicitly.
+10. ERP/NetSuite/SAP plugins are out of scope until the spine is complete.
+
+---
+
+## 2) System Architecture
+
+Reference repository shape:
+
+```text
+bailey-os/
+  README.md
+  BAILEY_OS_SPEC.md
+  BAILEY_OS_SWARM_BUILD_SPEC.md
+  ARCHITECTURE.md
+  MODULE_CONTRACTS.md
+  BUILD_ORDER.md
+  TRACEABILITY_MATRIX.md
+  pyproject.toml
+  bailey_os/
+    cli/
+    core/
+    context/
+    promptspeak/
+    skills/
+    plugins/
+    tools/
+    evidence/
+    claims/
+    critic/
+    ralph/
+    memory/
+    runs/
+    reporter/
+    security/
+    integrations/
+  docs/
+    product/
+    architecture/
+    business_process/
+    functional_specs/
+    technical_specs/
+    design_docs/
+    agent_packets/
+    test_plans/
+    decisions/
+  tests/
+    unit/
+    integration/
+    functional/
+    golden/
+    fixtures/
+  scripts/
+    dev/
+    build/
+    test/
+    validate/
+    generate_packets/
+  agent_packets/
+    000-control/
+    001-cli/
+    ...
+```
+
+Architectural backbone:
+- **CLI runtime**: command surface, config, and run orchestration.
+- **Context system**: project model, config/model loading, file inventory, budgeted summarization.
+- **Claim/evidence subsystem**: atomic claims, proof references, integrity checks.
+- **Critic loop**: deterministic and model-assisted validation.
+- **Ralph router**: decision state machine controlling promotion/revision/hold.
+- **Reporter**: standardized outputs with evidence map and decision trace.
+- **Extension surface**: PromptSpeak, skills, plugins, tools.
+
+---
+
+## 3) Module Boundaries
+
+### Core boundaries
+- `bailey_os/cli`: user-facing commands only; no deep business logic.
+- `bailey_os/context`: project discovery, policy loading, and context packaging.
+- `bailey_os/claims`: claim schemas and lifecycle states.
+- `bailey_os/evidence`: evidence schemas, mapping, IDs, integrity.
+- `bailey_os/critic`: worker output critique and risk/correction model.
+- `bailey_os/ralph`: routing state machine and transition policy.
+- `bailey_os/reporter`: final report construction and rendering.
+
+### Boundary rules
+- No module may bypass schema validation.
+- No report renderer may emit unverified factual claims.
+- Plugins/tools can extend inputs and actions but cannot change core decision state semantics.
+- Context and evidence stores are append-only within run scope (mutations require explicit policy path).
+
+---
+
+## 4) Data Schemas (Minimum Required)
+
+### Run schema
+- `run_id`
+- `project_id`
+- `started_at`
+- `ended_at`
+- `status`
+- `events[]`
+
+### Claim schema
+- `claim_id`
+- `run_id`
+- `claim_text`
+- `claim_type`
+- `source`
+- `impact`
+- `evidence_required`
+- `evidence_ids[]`
+- `status`
+- `confidence`
+- `created_at`
+
+Claim status enum:
+- `unverified`
+- `supported`
+- `partially_supported`
+- `unsupported`
+- `contradicted`
+- `inference`
+- `needs_human_review`
+
+### Evidence schema
+- `evidence_id`
+- `run_id`
+- `source_type` (file/url/model/tool)
+- `locator` (path+line, URL, message id, etc.)
+- `content_hash`
+- `captured_at`
+- `metadata`
+
+### Critic result schema
+- `artifact_id`
+- `checks[]`
+- `risks[]`
+- `corrections[]`
+- `decision_recommendation`
+- `confidence`
+
+### Ralph decision schema
+- `artifact_id`
+- `current_state`
+- `next_state`
+- `rationale`
+- `required_actions[]`
+- `timestamp`
+
+---
+
+## 5) Agent Packet Protocol
+
+Each packet must include:
+1. Mission
+2. Context
+3. Inputs
+4. Output paths
+5. Interfaces/contracts touched
+6. Acceptance tests
+7. Non-goals
+8. Failure modes
+9. Critic checks
+10. Handoff contract
+
+Canonical packet template fields:
+- `packet_id`
+- `pod`
+- `role`
+- `dependencies`
+- `estimated_scope`
+- `risk_level`
+- `deliverables`
+- `acceptance_tests`
+- `critic_checks`
+- `ralph_decision_criteria`
+
+Rule: **Work packets are durable assets; agents are disposable executors.**
+
+---
+
+## 6) 100-Agent Pod Map
+
+Use 10 pods of 10 roles each.
+
+- Pod 0: Control / architecture / integration
+- Pod 1: CLI and runtime
+- Pod 2: Context loading and project model
+- Pod 3: Evidence and claim system
+- Pod 4: Critic-loop and Ralph router
+- Pod 5: PromptSpeak and skill system
+- Pod 6: Plugin and tool framework
+- Pod 7: Business process and functional specs
+- Pod 8: Testing / QA / validation
+- Pod 9: Documentation / packaging / demos
+
+Per-pod role set:
+1. lead architect
+2. interface-contract agent
+3. implementation agent
+4. test agent
+5. critic agent
+6. documentation agent
+7. security agent
+8. integration agent
+9. edge-case agent
+10. cleanup/refactor agent
+
+---
+
+## 7) Build Waves (Staged Concurrency)
+
+### Wave 0 — Control documents
+Agents: Pod 0
+
+Outputs:
+- `BAILEY_OS_SPEC.md`
+- `ARCHITECTURE.md`
+- `MODULE_CONTRACTS.md`
+- `BUILD_ORDER.md`
+- `TRACEABILITY_MATRIX.md`
+
+Gate: no downstream coding until Wave 0 artifacts pass critic checks.
+
+### Wave 1 — Spine skeleton
+Agents: Pods 1–3
+
+Outputs:
+- CLI skeleton
+- Context loader
+- Claim/evidence schemas
+- Run logger
+- Baseline tests
+
+### Wave 2 — Validation engine
+Agents: Pod 4 + Pod 8
+
+Outputs:
+- Critic loop
+- Ralph router
+- Test harness / CI / golden tests
+
+### Wave 3 — Extension surfaces
+Agents: Pods 5–6
+
+Outputs:
+- PromptSpeak parser/linter/runtime
+- Skills registry/runner
+- Plugin and tool framework
+- `repo-audit` and `claim-check` starter plugins
+
+### Wave 4 — Productization
+Agents: Pods 7 + 9
+
+Outputs:
+- Business-process specs
+- Functional documentation
+- Demo flow
+- Packaging/handoff checklist
+
+---
+
+## 8) Acceptance Test Standards
+
+Global release gates:
+- Every module change includes tests.
+- All schema objects serialize/deserialize and validate.
+- CLI smoke tests pass for core commands.
+- Traceability from feature → test → docs exists.
+- Final report blocks unverified factual claims.
+
+Minimum required scenario tests:
+1. `bailey init` creates expected files.
+2. `bailey scan` returns stable JSON.
+3. Claims are atomic and schema-valid.
+4. Evidence IDs are unique within a run.
+5. Unsupported claims are blocked from final outputs.
+6. Ralph routes deterministically for canonical cases.
+7. Report includes decision, evidence map, and unresolved risks.
+
+---
+
+## 9) Critic-Loop Standard
+
+Each artifact must produce:
+- `claim`: what is asserted
+- `evidence`: what proves/grounds it
+- `test`: what verifies behavior
+- `risk`: what can fail
+- `correction`: what to do if it fails
+- `ralph_decision`: route outcome
+
+Critic tiers:
+1. deterministic checks (schema/contract/coverage)
+2. heuristic checks (atomicity/completeness)
+3. LLM checks (clarity, contradiction detection)
+
+Any failed deterministic check prevents SHIP.
+
+---
+
+## 10) Ralph Routing Standard
+
+Allowed states:
+- `SHIP`
+- `REVISE`
+- `RETRY_WITH_NARROWER_SCOPE`
+- `HOLD_FOR_EVIDENCE`
+- `HUMAN_REVIEW`
+- `PROMOTE_TO_MEMORY`
+- `ARCHIVE`
+
+Routing principles:
+- Prioritize deterministic evidence sufficiency.
+- Escalate to `HUMAN_REVIEW` for unresolved contradictions or policy conflicts.
+- Allow `PROMOTE_TO_MEMORY` only when stability and repeatability thresholds are met.
+- Use `RETRY_WITH_NARROWER_SCOPE` when failure is caused by over-broad packet scope.
+
+---
+
+## 11) First Milestone: `bailey audit-repo .`
+
+Milestone objective: prove end-to-end spine viability.
+
+Command:
+- `bailey audit-repo .`
+
+Expected report sections:
+1. Project identity
+2. File inventory
+3. README claims
+4. Detected entrypoints
+5. Tests found
+6. Missing tests
+7. Unsupported claims
+8. Evidence map
+9. Critic findings
+10. Ralph decision
+
+Definition of done:
+- Reproducible output format
+- Explicit claim/evidence linkage
+- Deterministic critic checks executed
+- Ralph decision recorded with rationale
+
+---
+
+## 12) Explicit Non-Goals for Initial Build
+
+Out of scope until spine passes all gates:
+- SAP/NetSuite/ERP production plugins
+- autonomous browser automation
+- expansive long-term memory strategies beyond controlled promotion
+- broad MCP integrations not required for first milestone
+
+---
+
+## 13) Master Swarm Brief Package
+
+Prepare this package before launching a large implementation swarm:
+
+```text
+bailey-os-swarm-brief/
+  00_MASTER_BRIEF.md
+  01_ARCHITECTURE.md
+  02_BUILD_ORDER.md
+  03_MODULE_CONTRACTS.md
+  04_AGENT_PACKET_PROTOCOL.md
+  05_ACCEPTANCE_TEST_STANDARD.md
+  06_CRITIC_LOOP_STANDARD.md
+  07_RALPH_ROUTER_STANDARD.md
+  agent_packets/
+    000-system-architecture.md
+    ...
+    100-final-integration-critic.md
+```
+
+This package is the control plane for any agent executor system (Claude Code, Codex, or equivalent).