Skip to content

feat(research): comparative study Pi Agent + pi-subagents + devstack vs RooSync/Claude Code harness #2416

@jsboige

Description

@jsboige

Summary

Comparative study of 3 alternative/open-source agent harness ecosystems against our current stack (RooSync v2.3 + Claude Code + Roo Code + roo-state-manager MCP + 6-machine fleet). Goal: identify migration opportunities, adaptation patterns, and architectural inspiration.

Projects Under Study

1. Pi Agent — 56.8k stars, MIT, v0.76.0

TypeScript mono-repo coding agent toolkit. Key components:

  • pi-ai: Unified multi-provider LLM API (Claude, GPT, Gemini, Bedrock, Vertex, OpenRouter, local models)
  • pi-agent-core: Agent runtime, tool calling, state management
  • pi-coding-agent: CLI coding harness (TUI)
  • pi-tui: Terminal UI

Notable features:

  • Non-blocking UI (model switching, effort, status while running)
  • Advanced session history (/tree, /fork, /clone for time-travel/branching)
  • Hot-reload extensions (/reload without restart)
  • Designed for extreme customizability ("tell pi to change itself")

2. pi-subagents — 1.6k stars

Extension for Pi with async subagent delegation. Strong parallels to our architecture:

  • 8 built-in agents: scout, researcher, planner, worker, reviewer, oracle, delegate, custom
  • Execution modes: parallel, chain, background (like our executor/meta patterns)
  • Worktree isolation per subagent (we do this via .claude/worktrees/)
  • Intercom bridge for cross-agent communication (like our dashboard workspace)
  • Recursion guards and skill injection
  • Delegation patterns resemble our Roo→Claude coordination

3. devstack — 21 stars

"Agentic practices knowledge base + supporting software toolkit" — opinionated guide for agentic programming. Key insights:

  • Karpathy LLM Wiki pattern: structured knowledge base with inbox/sources/wiki/ pipeline (we use MEMORY.md + codebase_search)
  • Extension ecosystem curation: pi-packages.json manifest for fleet-wide consistent setups
  • Context management tools: pi-context-prune (recoverable summarization), pi-vcc (zero-LLM algorithmic compaction), pi-continue-after-compaction (auto-continue post-compaction)
  • pi-multiloop: Battle-tested autoloop for long-running agent sessions (analogous to our executor cycles)
  • pi-multicodex: Automatic account rotation for quota/rate limits
  • Model evaluations: Comparative rankings of frontier + open models for coding tasks

Comparison Axes

A. Architecture

Aspect Our Stack Pi + Extensions
Agent runtime Claude Code (Anthropic) + Roo Code (extension) pi-agent-core (customizable TypeScript)
LLM providers Claude API + GLM/z.ai + vLLM Unified pi-ai (Claude, GPT, Gemini, Vertex, Bedrock, OpenRouter, local)
Multi-machine RooSync GDrive + 34-tool MCP Single-machine focus (subagents local)
Coordination Dashboard workspace + RooSync messages Intercom bridge (local subagents)
State persistence SQLite + Qdrant + GDrive Session history with fork/clone
Context management Auto-compaction (90% threshold) pi-vcc (algorithmic), pi-context-prune (recoverable)

B. What They Do Better

  1. LLM provider abstraction: pi-ai unified API vs our ad-hoc per-provider config
  2. Non-blocking UI: Pi allows model/status operations while running
  3. Session branching: /fork, /clone, /tree — we have no equivalent
  4. Zero-LLM compaction: pi-vcc uses deterministic extraction, no summarizer model needed (avoids 400 errors on long sessions)
  5. Recoverable pruning: pi-context-prune keeps originals retrievable on demand
  6. Extension hot-reload: /reload without restart vs our VS Code restart requirement
  7. Account rotation: pi-multicodex handles multi-account quota automatically
  8. Auto-continue post-compaction: pi-continue-after-compaction watches and sends continue

C. What We Do Better

  1. Multi-machine fleet coordination: 6 machines with RooSync GDrive sync, heartbeat, dashboard workspace — Pi is single-machine
  2. Cross-machine task dispatching: Hash-based partitioning (ROO_FLEET_ROSTER), claim discipline, anti-double-claim
  3. MCP ecosystem: 34-tool roo-state-manager for coordination, semantic search, conversation indexing
  4. Scheduler integration: schtasks + worker scripts for autonomous cycles
  5. Agent claim discipline: Verified claims, SHA verification, pre-claim anti-overlap
  6. CI/CD pipeline: check-submodule-pointer, validation scripts, PR mandatory rules

D. Migration/Adaptation Opportunities

Opportunity Source Effort Impact
Unified LLM provider layer pi-ai pattern Medium Simplify multi-provider config across fleet
Zero-LLM compaction pi-vcc pattern Medium Eliminate summarizer 400 errors, deterministic
Recoverable context pruning pi-context-prune Medium Lossless compaction with on-demand retrieval
Session branching/forking Pi /fork//clone High Enable experimentation without context loss
Extension hot-reload Pi /reload Low (if Pi adopted) Eliminate VS Code restart cycle
Auto-continue post-compaction pi-continue-after-compaction Low Prevent stalls after auto-compaction
Account rotation pi-multicodex pattern Low Multi-key rotation for API quotas
Worktree per-subagent pi-subagents Medium Stronger isolation for parallel agents
Fleet extension manifest devstack pi-packages.json Low Consistent MCP/extension setup across 6 machines
Karpathy wiki pattern devstack inbox→sources→wiki Low Structured knowledge management

Proposed Approach

  1. Deep-dive pi-subagents delegation patterns — closest to our multi-agent architecture, evaluate for Roo Code integration
  2. Prototype zero-LLM compaction — pi-vcc algorithmic approach for roo-state-manager
  3. Evaluate pi-multiloop — compare with our executor cycle pattern, identify improvements
  4. Study fleet-wide manifest — pi-packages.json concept for 6-machine config consistency
  5. Assess Pi as alternative harness — for executor machines currently using Claude Code

Context

  • Current stack: RooSync v2.3 + Claude Code + Roo Code + roo-state-manager (34 MCP tools) + 6-machine fleet
  • Our executor cycles: 360min intervals, auto-stop after 3 IDLE, wake routing via [WAKE-CLAUDE]
  • Compaction: 200k/90% universal threshold, auto-condensation at 92% dashboard

Acceptance Criteria

  • Comparative analysis document created in docs/harness/reference/
  • Top 3 adaptation opportunities identified with concrete implementation plans
  • Decision documented: adopt / adapt / defer for each opportunity
  • If adopting: issues created for implementation tasks

/label enhancement

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions