Skip to content

🎭 feat(agents): playwright-chatgpt agent (browser-driven ChatGPT via ACP bridge)#62

Open
chaizhenhua wants to merge 19 commits into
masterfrom
feat/playwright-chatgpt-agent
Open

🎭 feat(agents): playwright-chatgpt agent (browser-driven ChatGPT via ACP bridge)#62
chaizhenhua wants to merge 19 commits into
masterfrom
feat/playwright-chatgpt-agent

Conversation

@chaizhenhua
Copy link
Copy Markdown
Contributor

@chaizhenhua chaizhenhua commented May 17, 2026

Summary

Adds a fifth agent type, playwright-chatgpt, that drives a real Chromium session against chatgpt.com via a new Node.js ACP bridge. Slots into the existing GenericCliAdapter path with zero Rust adapter changes outside this PR's own provisioning hardening — registration is a single seed manifest entry.

Authentication model: No upstream API key is used. The bridge authenticates by attaching Chromium to a pre-seeded persistent profile that has already been logged into chatgpt.com via the bootstrap flow. That profile directory is a credential-equivalent artifact and must be protected accordingly.

  • crates/oversight-agents/manifests/playwright-chatgpt.toml — adapter manifest (id playwright-chatgpt, detect via file_exists on built CLI, pnpm_workspace install strategy)
  • bridges/playwright-chatgpt/ — TypeScript Node bridge (~1.6k lines): ACP JSON-RPC server over stdio, persistent-context Chromium launcher with bootstrap guard, layered stream-complete detector (SSE [DONE] / DOM stop-button / idle / hard-timeout), versioned selectorRegistry, session resume via chatgpt://conversation/<uuid>, honest text-only capabilities, hard-timeout surfaced as error, SessionNewParams.cwd honoured for artifact paths, cancel signal propagates as stopReason: "cancelled".
  • crates/oversight-worker/ — provisioning + discovery hardening for in-repo builtins: default allowlist gains pnpm / node / hermes; pnpm_workspace install strategy now requires OVERSIGHT_WORKER_REPO_ROOT and refuses fast otherwise; file_exists detect and runtime spawn args resolve the same env so detect / install / runtime never disagree on where to look.

Stack

Review range

12 commits, all playwright-chatgpt scope:

  1. 📐 docs(superpowers) × 2 — design spec + implementation plan
  2. 🎭 feat(agents) × 2 — playwright-chatgpt adapter manifest + server-side seed
  3. 🎭 feat(bridges) × 4 — TS bridge layers (types/codec/sessions; ACP server; browser/CLI/bootstrap/fixture; scaffolding)
  4. 🧪 test(worker) × 2 — manifest round-trips + cross-language smoke against fake-chatgpt fixture
  5. 🛠️ chore(make) × 1 — bridges-* targets; make setup installs the bridge
  6. ✨ feat(worker) × 1 — embed playwright-chatgpt in embedded_builtin_manifests() fallback (also absorbs the four follow-on provisioning fixes: pnpm/node allowlist, pnpm_workspace cwd, repo-relative path resolution, hermes allowlist + test coverage)

The previous 25-commit stack squashed the bridge review-fix commits into feat(bridges): browser layer + CLI + bootstrap and the worker review-fix commits into feat(worker): mirror playwright-chatgpt in embedded_builtin_manifests fallback; six unrelated commits were extracted to #64. Net effect: same diff, half the SHAs.

Builtin auto-provision requirements

playwright-chatgpt's install steps live inside this repo (pnpm --dir bridges/playwright-chatgpt …). The worker provisioner used to default to a per-install tempdir as cwd, which silently broke the in-repo paths. Two operator-visible knobs make this explicit:

Env Default When to set
OVERSIGHT_WORKER_REPO_ROOT unset → pnpm_workspace installs fail with a clear error; file_exists detect and runtime spawn fall back to process cwd Set to the oversight repo root on workers that should autoinstall this builtin. Other strategies still use a tempdir.
WORKER_PROVISION_ALLOWLIST now includes pnpm / node / hermes so the builtin is accepted out of the box Override only to add/remove binaries.
(manual fallback) Run make bridges-install on the worker host and skip autoinstall entirely.

Test plan

Covered by CI

  • cargo test -p oversight-worker --test playwright_chatgpt_manifest — manifest round-trips through GenericCliConfig::validate()
  • cargo test -p oversight-worker provisioning::tests::default_allowlist_accepts_every_builtin_manifests_install — walks claude/codex/hermes/playwright-chatgpt against the default allowlist
  • cargo test -p oversight-worker discovery::tests::detect_file_exists_resolves_relative_path_against_repo_root_env
  • cargo test -p oversight-worker adapters::generic::tests::resolve_runtime_arg_*
  • pnpm --dir bridges/playwright-chatgpt test:unit — codec / sessions / selectors / server (includes the test that pins the honest prompt capabilities + cwd-forwarding test)
  • pnpm --dir bridges/playwright-chatgpt test:integration — Playwright test over local fake-chatgpt fixture
  • cargo test -p oversight-worker --test playwright_chatgpt_smoke — real Node bridge spawned over stdio against the fake fixture, Rust asserts ACP frames
  • make bridges-install / bridges-build / bridges-test / bridges-test-integration targets; make setup now installs the bridge

Not covered by CI (manual until follow-up e2e runner lands)

  • Real chatgpt.com smoke: reviewer runs node bridges/playwright-chatgpt/dist/bootstrap.js --profile-dir /tmp/oversight-test-profile, logs in, then exercises the live site via a worker
  • Cross-host headless behaviour (different Chromium revisions, sandbox profiles)
  • Rate-limit / session-expiry handling against the live site

Risks / limitations

  • CI does not exercise the real chatgpt.com path. All automated coverage runs against the local fake-chatgpt fixture; live-site behaviour is verified only by manual smoke until a tests/e2e/ runner lands.
  • Persistent profile is credential-equivalent. The pre-seeded Chromium profile holds an authenticated web session for chatgpt.com. Treat the profile directory like a secret.
  • Selector drift. chatgpt.com's DOM is not a stable interface; selectorRegistry is versioned so patches ship without recompiling.
  • Browser + host dependency. Operators must install Playwright's bundled Chromium (or a compatible system Chromium) and configure host sandboxing themselves.
  • Single-profile, single-session per worker. Multi-profile pooling, attachment uploads, and explicit model-selector clicks are deferred.

Deferred / follow-ups

  • Real chatgpt.com e2e runner under tests/e2e/ (lifts the "not covered by CI" caveat)
  • Model selector clicks (currently passes ?model=<id> via URL)
  • Attachment upload path (attach.ts)
  • Multi-profile pool support
  • Selector drift detector with daily DOM snapshot diff

@chaizhenhua chaizhenhua force-pushed the chore/manifest-cli branch from 51f5729 to efa0145 Compare May 17, 2026 17:00
@chaizhenhua chaizhenhua force-pushed the feat/playwright-chatgpt-agent branch from f011572 to 988dc9a Compare May 17, 2026 17:03
Base automatically changed from chore/manifest-cli to master May 17, 2026 23:01
@chaizhenhua chaizhenhua force-pushed the feat/playwright-chatgpt-agent branch from 4a00905 to 96d70ab Compare May 19, 2026 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant