Skip to content

rlaope/bestwork-agent

Repository files navigation

bestwork-agent

Work like a corporation team, not just a club. Your AI agent gets the org chart, the quality gates, and the team reviews that real engineering orgs use.

npm version license stars

English · 한국어 · 日本語


The problem

Your AI agent works alone. It hallucinates, loops, misses requirements, and you find out too late. 45% of AI-generated code contains vulnerabilities (Veracode). Vibe-coded apps fail because nobody validated the idea before building.

bestwork-agent organizes your AI agent the way top unicorn companies organize their engineering teams — and bolts on the quality gates those teams rely on, so the work that ships is actually trustworthy.

How the team forms itself

bestwork analyzes every prompt and picks the right team shape before any code is written.

You: "Refactor the auth module to support OAuth2"

bestwork analyzes → large scope, architecture decision, security-sensitive
bestwork selects → Hierarchy: Security Team

┌─────────────────────────────────────────────────────┐
│  CISO                                               │
│  "Attack surface acceptable. Approve with           │
│   condition: rotate existing JWT secrets on deploy." │
│          ▲ final decision                           │
│  Tech Lead                                          │
│  "OAuth2 PKCE flow is correct. Consolidate the      │
│   two token refresh paths into one."                │
│          ▲ architecture review                      │
│  Sr. Security Engineer                              │
│  "Implementation secure. Added CSRF protection.     │
│   Input validation on redirect_uri."                │
│          ▲ implementation + hardening               │
│  Jr. QA Engineer                                    │
│  "Found: /callback doesn't handle expired state     │
│   param. Added test for token replay attack."       │
│          ▲ fresh eyes + edge cases                  │
└─────────────────────────────────────────────────────┘
You: "Add a dark mode toggle to the settings page"

bestwork analyzes → single feature, localized scope, fast feedback needed
bestwork selects → Squad: Feature Squad

┌──────────────────────────────────────────────────────┐
│                  Feature Squad (parallel)             │
│                                                       │
│  Sr. Backend         Sr. Frontend        Product Lead │
│  "API endpoint       "Toggle component   "Matches    │
│   for user prefs     with CSS vars,      user story. │
│   ready. Tests       accessible."        Ship it."   │
│   passing."                                          │
│                          QA Lead                      │
│                    "Tested light/dark                 │
│                     + system pref.                    │
│                     All green."                       │
│                                                       │
│  Verdict: all APPROVE → merged                        │
└──────────────────────────────────────────────────────┘
You: "Why did my last session struggle?"

bestwork analyzes → observability request, not coding
bestwork selects → data analysis

  Session Outcome — b322dc3e  ✗ struggling

  Duration:     45m
  Calls/Prompt: 38 (high — avg is 12)
  Loop detected: Edit → Bash(test fail) → Edit × 6 on auth.ts

  Root cause: missing import caused test failure loop.
  Recommendation: use ./strict to force read-before-edit.

How the team picks its shape

bestwork mirrors how the best engineering organizations operate.

Hierarchy — for decisions that need authority levels

CTO → Tech Lead → Sr. Engineer → Jr. Engineer

Junior implements first (fresh perspective catches obvious issues), seniors refine, leads review architecture, C-level makes final strategic calls. Each level can send work back down.

Squad — for tasks that need speed and collaboration

Backend + Frontend + Product + QA (all equal)

Everyone works in parallel. No single authority. Consensus-driven. Fast.

The gateway picks automatically from task signals:

  • Simple fix / rename / format → solo (one agent, no overhead)
  • Two related sub-tasks → pair (one agent per task + critic)
  • Multiple sub-tasks → trio (tech + PM + critic per task, parallel)
  • Large scope / cross-directory / architecture → hierarchy (CTO → Lead → Senior → Junior)
  • Single feature / bugfix / localized → squad (flat, consensus-driven)
  • Security-sensitive files → security team
  • Infra / CI/CD files → infra squad

For non-solo work, the gateway shows you the plan and lets you confirm, adjust, or drop to solo.

The quality gates the team runs on

Team structure alone isn't enough — AI agents still hallucinate. Every action routes through quality gates that real engineering teams rely on.

Gate When What it catches
Grounding PreToolUse (Edit/Write) Editing files the agent hasn't read
Scope lock PreToolUse Edits outside the locked directory
Strict mode PreToolUse rm -rf, git push --force
Type check PostToolUse (Edit/Write) TypeScript errors after every change
Review On demand / PostToolUse Fake imports, hallucinated methods, platform mismatch, deprecated APIs
Requirement check PostToolUse (Edit/Write) Unmet requirements from clarify/validate sessions
Verifier After author's pass Separate-pass completion check with fresh evidence table
Validate Before building Evidence-based go/no-go — is this feature worth building?

All gates run automatically. You just type your prompt.

Proof: harness ON vs OFF

═══════════════════════════════════
  HARNESS EFFECTIVENESS BENCHMARK
═══════════════════════════════════

  Scenarios:      13
  Accuracy:       100.0%

  Harness ON:
    Catch rate:   100% (10/10)
    False pos:    0

  Harness OFF (vanilla):
    Catch rate:   0% (0/10)

  Categories:
    hallucination    3/4 caught
    platform         4/4 caught
    deprecated       1/1 caught
    security         1/1 caught
═══════════════════════════════════

Run it yourself: npm run benchmark

Install

Option 1: Claude Code Plugin (recommended)

/plugin marketplace add https://github.com/rlaope/bestwork-agent
/plugin install bestwork-agent

Option 2: npm

npm install -g bestwork-agent
bestwork install

50 Domain Specialists

bestwork agents    # full catalog

25 Tech: backend, frontend, fullstack, infra, database, API, mobile, testing, security, performance, devops, data, ML, CLI, realtime, auth, migration, config, agent-engineer, plugin, accessibility, i18n, graphql, monorepo, writer

10 PM: product, API, platform, data, infra, migration, security, growth, compliance, DX

15 Critic: performance, scalability, security, consistency, reliability, testing, hallucination, DX, type safety, cost, accessibility, devsecops, i18n, agent, verifier

Agent prompts live in prompts/ — edit without rebuilding.

22 Skills

Natural language or slash command — the gateway routes automatically.

Skill What it does
validate Evidence-based feature validation before building
clarify Targeted requirement questions before execution
review Hallucination and platform mismatch scan
trio Parallel execution with quality gates
plan Scope analysis and team recommendation
delegate Autonomous execution without confirmation
deliver Persistent completion — retry until done
waterfall Sequential staged processing with gates
blitz Maximum parallelism burst
doctor Deploy config vs code integrity check
pipeline-run Queue and auto-process multiple GitHub issues
superthinking 1000-iteration thought simulation

And 10 more: agents, changelog, docs, health, install, meetings, onboard, sessions, status, update.

vs. other tools

bestwork-agent CrewAI MetaGPT Vanilla Claude Code
Target Claude Code users General Python General Python Everyone
Integration Native hooks (zero config) Separate runtime Separate runtime Built-in
Team structure Auto (hierarchy/squad/trio/pair/solo) Manual Manual None
Hallucination catch 100% (10/10 benchmark) No built-in No built-in 0%
Overhead ~0 (shell hooks) 3x tokens 2-5x tokens 0
Feature validation Built-in (validate skill) None None None
Requirement tracking Auto (clarify → PostToolUse) Manual Manual None

Harness Controls

./scope src/auth/       Lock edits to directory
./unlock                Remove lock
./strict                Block rm -rf, force read-before-edit
./relax                 Disable strict
./tdd add user auth     Test-driven development flow
./review                Hallucination scan
./validate              Is this feature worth building?
./clarify               Requirement deep-check before execution

Observability

bestwork                  # TUI dashboard
bestwork sessions         # Session list
bestwork heatmap          # 365-day activity grid
bestwork loops            # Loop detection
bestwork replay <id>      # Session playback
bestwork effectiveness    # Prompt efficiency trend

Notifications

./discord <webhook_url>
./slack <webhook_url>

Rich notifications per prompt: summary, git diff, review results, session health. Color-coded green/yellow/red.

Security

Everything is local. No data leaves your machine. See SECURITY.md.

Contributing

See CONTRIBUTING.md.

License

MIT

About

The agent skills, work like the best organization squad or corporation.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors