AI agents increasingly need browser access — to fill forms, extract data, navigate workflows, and interact with web applications. Today, four categories of browser tools exist for agents:
- MCP servers — expose browser actions as tools via the Model Context Protocol
- CLI tools — shell commands that agents invoke directly (often paired with SKILL.md files)
- Browser protocols — raw Chrome DevTools / CDP access for debugging and custom control
- Screenshot/vision — pixel-level interaction via screenshots and coordinates
This guide covers the MCP and CLI approaches with verifiable facts, focusing on BAP (Browser Agent Protocol) and Playwright MCP / Playwright CLI. All benchmark data is reproducible via the benchmark suite.
| Tool | Interface | Publisher | npm Package | License |
|---|---|---|---|---|
| BAP MCP | MCP (stdio) | browseragentprotocol | @browseragentprotocol/mcp |
Apache-2.0 |
| BAP CLI | Shell commands | browseragentprotocol | @browseragentprotocol/cli |
Apache-2.0 |
| Playwright MCP | MCP (stdio) | Microsoft | @playwright/mcp |
Apache-2.0 |
| Playwright CLI | Shell commands | Microsoft | @playwright/cli |
Apache-2.0 |
| Chrome DevTools / CDP | Browser protocol | Google / Chromium | built into Chrome | Chromium licenses |
Playwright MCP GitHub stars: ~27.5k (as of Feb 2026). Microsoft-backed with a large ecosystem.
BAP and Playwright both use Playwright as the automation engine. Chrome DevTools / CDP talks directly to Chrome's native debugging protocol.
Playwright MCP embeds Playwright directly in the MCP server process. When an agent calls a tool, the server executes the browser action in-process. This means lower per-call latency — no inter-process communication overhead.
BAP MCP uses a bridge architecture: the MCP server communicates with a separate Playwright server over WebSocket (JSON-RPC 2.0). This adds ~50–200ms per call but enables:
- Session persistence — the browser survives MCP server restarts
- Multi-client access — CLI and MCP can control the same browser simultaneously
- Shared state — observations, element refs, and cookies persist across interfaces
Standalone shell commands. Each invocation is a separate process. The --install-skills flag generates a SKILL.md for agent consumption.
Shell commands that connect to a persistent daemon (shared with MCP). The browser survives across commands, and element refs from bap observe remain valid for subsequent bap act calls.
Chrome DevTools Protocol is the lowest-level option in this landscape. It is
excellent for debugging, profiling, network inspection, and custom browser
instrumentation, but it is not an agent-ready workflow on its own. You have to
orchestrate DOM queries, event subscriptions, and multi-step actions manually.
BAP sits above this layer with semantic selectors, observe, act, extract,
response tiers, and shared browser state.
From the Playwright MCP README:
"If you are using a coding agent, you might benefit from using the CLI+SKILLS instead."
BAP agrees with this guidance — CLI + SKILL.md is the better pattern for coding agents. BAP CLI extends it with composite actions, semantic selectors, and structured extraction.
Side-by-side comparison of BAP MCP and Playwright MCP. Every claim links to a verifiable source.
| Dimension | BAP MCP | Playwright MCP | Source |
|---|---|---|---|
| Tools | 23 | 31 (17 core + 6 vision + 5 test + 3 other) | BAP MCP source, Playwright MCP README |
| Composite actions | act batches N steps in 1 call |
No built-in batching | Playwright MCP README (verified: no batch_execute or similar) |
| Observation | observe → structured elements with refs, selectors, action hints |
browser_snapshot → raw accessibility tree |
Benchmark observe scenario |
| Extraction | extract with JSON Schema |
browser_evaluate with custom JS |
Benchmark extract scenario |
| Fused operations | navigate+observe, act+pre/postObserve in 1 call | Not available | BAP protocol spec |
| Response tiers | full / interactive / minimal | Not available | BAP protocol spec |
| WebMCP discovery | discover_tools + observe integration |
Not available | BAP MCP source |
| Per-call latency | +50–200ms (WebSocket overhead) | Lower (single-process) | Benchmark fairness notes |
| Form filling | act composite (N fills + click = 1 call) |
browser_fill_form (batches fills, separate click) |
Benchmark form scenario |
All data from the reproducible benchmark suite. Clone the repo and run ./run.sh to reproduce.
- Both servers spawned via
StdioClientTransport— identical to how any MCP client connects - Real websites (saucedemo.com, books.toscrape.com, etc.), not synthetic test pages
- No LLM involved — measures raw MCP tool efficiency, not prompt quality
- Each scenario: 1 warmup run (excluded) + N measured runs, median selected
- Token estimation:
ceil(responsePayloadBytes / 4) - All tool calls timed with
performance.now()
The benchmarks use three variants to separate BAP's core advantage (composite actions) from its optimization layer (fused operations):
| Variant | Rules | What it measures |
|---|---|---|
| BAP Standard | Must observe before acting, use refs from observe output. Re-observe after page navigation. | Apples-to-apples with Playwright |
| BAP Fused | Can use semantic selectors without prior observe. Can use fused navigate(observe:true) and act(postObserve:true). |
BAP's full optimization layer |
| Playwright | Standard snapshot-then-act workflow. Uses most efficient tools available (browser_fill_form, browser_evaluate). |
Baseline |
The fair comparison is BAP Standard vs Playwright. BAP Fused is explicitly an optimization layer.
| Scenario | Site | BAP Standard | BAP Fused | Playwright | Std vs PW | Fused vs PW |
|---|---|---|---|---|---|---|
| baseline | quotes.toscrape.com | 2 | 2 | 2 | Tie | Tie |
| observe | news.ycombinator.com | 2 | 1 | 2 | Tie | -50% |
| extract | books.toscrape.com | 2 | 2 | 2 | Tie | Tie |
| form | the-internet.herokuapp.com | 4 | 3 | 5 | -20% | -40% |
| ecommerce | saucedemo.com | 8 | 5 | 11 | -27% | -55% |
| workflow | books.toscrape.com | 5 | 4 | 5 | Tie | -20% |
| Total | 23 | 17 | 27 | ~15% | ~37% |
Source: src/scenarios/ in the benchmarks repo.
- Composite
act: Batching multiple steps (fill+fill+click) into one call is the primary advantage. Most impactful in multi-step flows like ecommerce (8 vs 11 calls). - Fused operations:
navigate(observe:true)andact(postObserve:true)eliminate redundant server roundtrips. Largest impact in ecommerce (-55%). - Structured
extract: JSON Schema-based extraction vs writing custom JS forbrowser_evaluate.
- Per-call latency: Playwright MCP is a single process. BAP's two-process WebSocket architecture adds ~50–200ms per call. Playwright wins wall-clock time on most scenarios.
- Element disambiguation: Playwright's positional snapshot refs uniquely identify elements. BAP's observe can return ambiguous selectors for identical elements (e.g., 6 "Add to cart" buttons on saucedemo.com).
- Setup simplicity:
npx @playwright/mcp— single process, no daemon management. - Ecosystem: 27.5k GitHub stars, Microsoft-backed, extensive testing ecosystem integration.
These benchmarks are designed to be honest, not promotional. Important caveats:
-
BAP Standard is the fair comparison. BAP Standard follows the same observe-then-act pattern as Playwright (observe the page, get element refs, act on them). BAP Fused shows what's possible with optimization but isn't an apples-to-apples comparison.
-
Latency favors Playwright. BAP's two-process architecture adds ~50–200ms WebSocket overhead per call. Playwright MCP is consistently faster on wall-clock time per call.
-
Token estimation is approximate.
ceil(bytes / 4)is a rough heuristic. Screenshots inflate counts due to base64 encoding. -
No LLM involved. All tool arguments are pre-written. In real agent flows, both tools would need additional calls for the LLM to decide what to do.
-
BAP
extractuses heuristics. Playwright'sbrowser_evaluateruns precise DOM queries and may return more accurate results. -
Playwright uses its most efficient tools. Each scenario uses
browser_fill_formfor batched fills andbrowser_evaluatefor direct JS extraction. We do not artificially inflate Playwright's call counts. -
BAP has known limitations. Identical elements (e.g., 6 "Add to cart" buttons) can produce ambiguous selectors. The cart icon on saucedemo.com has no accessible name, requiring direct URL navigation. See the benchmark README for the full list.
| Dimension | BAP CLI | Playwright CLI | Source |
|---|---|---|---|
| Commands | 23 | ~70+ (granular: individual storage, network, DevTools cmds) | BAP CLI docs, Playwright CLI README |
| Composite actions | bap act fill:...=val click:... (N steps, 1 cmd) |
Individual commands | CLI docs |
| Semantic selectors | role:button:"Submit", label:"Email" |
Accessibility tree refs (e<N>) |
CLI docs |
| Observation | bap observe --tier=interactive (tiered output) |
playwright-cli snapshot (full tree) |
CLI docs |
| Extraction | bap extract --fields="title,price" |
playwright-cli eval (manual JS) |
CLI docs |
| SKILL.md | Yes (CLI + MCP variants) | Yes (--install-skills) |
Package repos |
| Token efficiency | Composite actions + response tiers | "Token-efficient. Does not force page data into LLM." (official README — no specific numbers) | Playwright CLI README |
| Platform support | 13 platforms via bap install-skill |
Claude Code, GitHub Copilot | Package READMEs |
Note on third-party claims: Some blogs cite specific token reduction numbers for Playwright CLI (e.g., "4x fewer tokens"). These numbers are not in Microsoft's official README and we do not cite them here. Microsoft's official claim is: "Token-efficient. Does not force page data into LLM."
For a detailed command-by-command mapping between Playwright CLI and BAP CLI, see the migration guide.
| Dimension | BAP CLI / MCP | Chrome DevTools / CDP |
|---|---|---|
| Level | Agent-ready workflow layer | Raw browser debugging protocol |
| Selectors | Semantic selectors, refs, structured observe output | Manual DOM / runtime scripting |
| Multi-step actions | act batches steps and fused observe flows |
You compose sequences yourself |
| Extraction | extract with schema or field hints |
Custom JS / protocol calls |
| Token efficiency | Response tiers + fewer roundtrips | Depends on your own orchestration |
| Best for | Coding agents and repeated browser tasks | Debugging, profiling, low-level custom tooling |
Use Chrome DevTools when you need raw protocol domains or existing browser inspection. Use BAP when you want a default browser interface for agents that need to get work done with fewer calls and less prompt overhead.
→ BAP CLI with bap install-skill
Why: Composite bap act batches multi-step flows into one shell command. Semantic selectors (role:button:"Submit") survive page redesigns. Structured bap extract --fields="title,price" eliminates writing custom JS. The persistent daemon keeps browser state warm across commands, which is the right shape for coding agents. SKILL.md is available for 13 platforms.
Alternative: Playwright CLI for simple single-action interactions where composite batching isn't needed.
→ BAP MCP (npx @browseragentprotocol/mcp)
Why: act batches steps, observe returns structured elements with refs, fused operations (navigate+observe, act+postObserve) cut roundtrips. extract with JSON Schema for structured data.
Alternative: Playwright MCP if per-call latency matters more than total call count, or if you're already embedded in the Playwright testing ecosystem.
→ BAP — shared server architecture. The CLI daemon and MCP bridge connect to the same Playwright server. Observations, element refs, and cookies persist across both interfaces.
Playwright MCP and Playwright CLI are separate processes with no shared state.
→ Playwright MCP is the zero-friction add-on for your existing Playwright setup. If you already use Playwright for testing, adding the MCP server requires no new dependencies.
→ Chrome DevTools / CDP
Use DevTools when you need low-level browser debugging or custom protocol work. Use BAP when the job is agent automation rather than browser instrumentation.
BAP and Playwright use the same engine (Playwright). BAP adds composite actions, semantic selectors, structured extraction, fused operations, and a CLI-first workflow for coding agents. In benchmarks, BAP Standard uses ~15% fewer tool calls than Playwright in an apples-to-apples comparison, primarily from batching multi-step actions. BAP Fused extends this to ~37% through navigate+observe and act+postObserve fusion. Playwright wins on per-call latency and element disambiguation. Chrome DevTools is lower-level than both: great for debugging, but not the default interface most agents should use for day-to-day browser work.
npm i -g @browseragentprotocol/cli
bap install-skill # Auto-detects your agent platform, installs SKILL.mdnpx @browseragentprotocol/mcp/install-plugin https://github.com/browseragentprotocol/bap
Last updated: Feb 2026. All star counts, tool counts, and benchmark data verified at time of writing. Run the benchmark suite to reproduce.