bestwork-agent

Work like a corporation team, not just a club. Your AI agent gets the org chart, the quality gates, and the team reviews that real engineering orgs use.

English · 한국어 · 日本語

The problem

Your AI agent works alone. It hallucinates, loops, misses requirements, and you find out too late. 45% of AI-generated code contains vulnerabilities (Veracode). Vibe-coded apps fail because nobody validated the idea before building.

bestwork-agent organizes your AI agent the way top unicorn companies organize their engineering teams — and bolts on the quality gates those teams rely on, so the work that ships is actually trustworthy.

How the team forms itself

bestwork analyzes every prompt and picks the right team shape before any code is written.

You: "Refactor the auth module to support OAuth2"

bestwork analyzes → large scope, architecture decision, security-sensitive
bestwork selects → Hierarchy: Security Team

┌─────────────────────────────────────────────────────┐
│  CISO                                               │
│  "Attack surface acceptable. Approve with           │
│   condition: rotate existing JWT secrets on deploy." │
│          ▲ final decision                           │
│  Tech Lead                                          │
│  "OAuth2 PKCE flow is correct. Consolidate the      │
│   two token refresh paths into one."                │
│          ▲ architecture review                      │
│  Sr. Security Engineer                              │
│  "Implementation secure. Added CSRF protection.     │
│   Input validation on redirect_uri."                │
│          ▲ implementation + hardening               │
│  Jr. QA Engineer                                    │
│  "Found: /callback doesn't handle expired state     │
│   param. Added test for token replay attack."       │
│          ▲ fresh eyes + edge cases                  │
└─────────────────────────────────────────────────────┘

You: "Add a dark mode toggle to the settings page"

bestwork analyzes → single feature, localized scope, fast feedback needed
bestwork selects → Squad: Feature Squad

┌──────────────────────────────────────────────────────┐
│                  Feature Squad (parallel)             │
│                                                       │
│  Sr. Backend         Sr. Frontend        Product Lead │
│  "API endpoint       "Toggle component   "Matches    │
│   for user prefs     with CSS vars,      user story. │
│   ready. Tests       accessible."        Ship it."   │
│   passing."                                          │
│                          QA Lead                      │
│                    "Tested light/dark                 │
│                     + system pref.                    │
│                     All green."                       │
│                                                       │
│  Verdict: all APPROVE → merged                        │
└──────────────────────────────────────────────────────┘

You: "Why did my last session struggle?"

bestwork analyzes → observability request, not coding
bestwork selects → data analysis

  Session Outcome — b322dc3e  ✗ struggling

  Duration:     45m
  Calls/Prompt: 38 (high — avg is 12)
  Loop detected: Edit → Bash(test fail) → Edit × 6 on auth.ts

  Root cause: missing import caused test failure loop.
  Recommendation: use ./strict to force read-before-edit.

How the team picks its shape

bestwork mirrors how the best engineering organizations operate.

Hierarchy — for decisions that need authority levels

CTO → Tech Lead → Sr. Engineer → Jr. Engineer

Junior implements first (fresh perspective catches obvious issues), seniors refine, leads review architecture, C-level makes final strategic calls. Each level can send work back down.

Squad — for tasks that need speed and collaboration

Backend + Frontend + Product + QA (all equal)

Everyone works in parallel. No single authority. Consensus-driven. Fast.

The gateway picks automatically from task signals:

Simple fix / rename / format → solo (one agent, no overhead)
Two related sub-tasks → pair (one agent per task + critic)
Multiple sub-tasks → trio (tech + PM + critic per task, parallel)
Large scope / cross-directory / architecture → hierarchy (CTO → Lead → Senior → Junior)
Single feature / bugfix / localized → squad (flat, consensus-driven)
Security-sensitive files → security team
Infra / CI/CD files → infra squad

For non-solo work, the gateway shows you the plan and lets you confirm, adjust, or drop to solo.

The quality gates the team runs on

Team structure alone isn't enough — AI agents still hallucinate. Every action routes through quality gates that real engineering teams rely on.

Gate	When	What it catches
Grounding	PreToolUse (Edit/Write)	Editing files the agent hasn't read
Scope lock	PreToolUse	Edits outside the locked directory
Strict mode	PreToolUse	`rm -rf`, `git push --force`
Type check	PostToolUse (Edit/Write)	TypeScript errors after every change
Review	On demand / PostToolUse	Fake imports, hallucinated methods, platform mismatch, deprecated APIs
Requirement check	PostToolUse (Edit/Write)	Unmet requirements from clarify/validate sessions
Verifier	After author's pass	Separate-pass completion check with fresh evidence table
Validate	Before building	Evidence-based go/no-go — is this feature worth building?

All gates run automatically. You just type your prompt.

Proof: harness ON vs OFF

═══════════════════════════════════
  HARNESS EFFECTIVENESS BENCHMARK
═══════════════════════════════════

  Scenarios:      13
  Accuracy:       100.0%

  Harness ON:
    Catch rate:   100% (10/10)
    False pos:    0

  Harness OFF (vanilla):
    Catch rate:   0% (0/10)

  Categories:
    hallucination    3/4 caught
    platform         4/4 caught
    deprecated       1/1 caught
    security         1/1 caught
═══════════════════════════════════

Run it yourself: npm run benchmark

Install

Option 1: Claude Code Plugin (recommended)

/plugin marketplace add https://github.com/rlaope/bestwork-agent
/plugin install bestwork-agent

Option 2: npm

npm install -g bestwork-agent
bestwork install

50 Domain Specialists

bestwork agents    # full catalog

25 Tech: backend, frontend, fullstack, infra, database, API, mobile, testing, security, performance, devops, data, ML, CLI, realtime, auth, migration, config, agent-engineer, plugin, accessibility, i18n, graphql, monorepo, writer

10 PM: product, API, platform, data, infra, migration, security, growth, compliance, DX

15 Critic: performance, scalability, security, consistency, reliability, testing, hallucination, DX, type safety, cost, accessibility, devsecops, i18n, agent, verifier

Agent prompts live in prompts/ — edit without rebuilding.

22 Skills

Natural language or slash command — the gateway routes automatically.

Skill	What it does
`validate`	Evidence-based feature validation before building
`clarify`	Targeted requirement questions before execution
`review`	Hallucination and platform mismatch scan
`trio`	Parallel execution with quality gates
`plan`	Scope analysis and team recommendation
`delegate`	Autonomous execution without confirmation
`deliver`	Persistent completion — retry until done
`waterfall`	Sequential staged processing with gates
`blitz`	Maximum parallelism burst
`doctor`	Deploy config vs code integrity check
`pipeline-run`	Queue and auto-process multiple GitHub issues
`superthinking`	1000-iteration thought simulation

And 10 more: agents, changelog, docs, health, install, meetings, onboard, sessions, status, update.

vs. other tools

	bestwork-agent	CrewAI	MetaGPT	Vanilla Claude Code
Target	Claude Code users	General Python	General Python	Everyone
Integration	Native hooks (zero config)	Separate runtime	Separate runtime	Built-in
Team structure	Auto (hierarchy/squad/trio/pair/solo)	Manual	Manual	None
Hallucination catch	100% (10/10 benchmark)	No built-in	No built-in	0%
Overhead	~0 (shell hooks)	3x tokens	2-5x tokens	0
Feature validation	Built-in (validate skill)	None	None	None
Requirement tracking	Auto (clarify → PostToolUse)	Manual	Manual	None

Harness Controls

./scope src/auth/       Lock edits to directory
./unlock                Remove lock
./strict                Block rm -rf, force read-before-edit
./relax                 Disable strict
./tdd add user auth     Test-driven development flow
./review                Hallucination scan
./validate              Is this feature worth building?
./clarify               Requirement deep-check before execution

Observability

bestwork                  # TUI dashboard
bestwork sessions         # Session list
bestwork heatmap          # 365-day activity grid
bestwork loops            # Loop detection
bestwork replay <id>      # Session playback
bestwork effectiveness    # Prompt efficiency trend

Notifications

./discord <webhook_url>
./slack <webhook_url>

Rich notifications per prompt: summary, git diff, review results, session health. Color-coded green/yellow/red.

Security

Everything is local. No data leaves your machine. See SECURITY.md.

Contributing

See CONTRIBUTING.md.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 161 Commits
.analysis		.analysis
.claude-plugin		.claude-plugin
.github		.github
benchmarks		benchmarks
dist		dist
docs		docs
hooks		hooks
mcp		mcp
prompts		prompts
schemas		schemas
scripts		scripts
skills		skills
src		src
templates		templates
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.ja.md		README.ja.md
README.ko.md		README.ko.md
README.md		README.md
SECURITY.md		SECURITY.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bestwork-agent

The problem

How the team forms itself

How the team picks its shape

The quality gates the team runs on

Proof: harness ON vs OFF

Install

Option 1: Claude Code Plugin (recommended)

Option 2: npm

50 Domain Specialists

22 Skills

vs. other tools

Harness Controls

Observability

Notifications

Security

Contributing

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bestwork-agent

The problem

How the team forms itself

How the team picks its shape

The quality gates the team runs on

Proof: harness ON vs OFF

Install

Option 1: Claude Code Plugin (recommended)

Option 2: npm

50 Domain Specialists

22 Skills

vs. other tools

Harness Controls

Observability

Notifications

Security

Contributing

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages