From 206638c059a3366c67d356318a2cea0f9eb2430a Mon Sep 17 00:00:00 2001
From: Nandini Muralidharan <nandinim+microsoft@microsoft.com>
Date: Thu, 30 Apr 2026 12:21:19 +0530
Subject: [PATCH 1/4] feat: add browser-harness parallel web scraping sample

Demonstrates using browser-harness with Playwright Workspaces to run 10+
parallel remote browser sessions for web scraping with LiveView debuggability.

Includes:
- Jupyter notebook with 6-section walkthrough
- LiveViewWatcher helper for real-time session monitoring
- Explicit coding agent prompt for PWW CDP connection
- Product scraping example targeting books.toscrape.com

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 .../browser-harness-webscraping/.env.template |   8 +
 samples/browser-harness-webscraping/README.md | 110 +++++
 .../helpers/__init__.py                       |   0
 .../helpers/live_view_watcher.py              | 107 ++++
 .../parallel_webscraping.ipynb                | 455 ++++++++++++++++++
 .../requirements.txt                          |   6 +
 6 files changed, 686 insertions(+)
 create mode 100644 samples/browser-harness-webscraping/.env.template
 create mode 100644 samples/browser-harness-webscraping/README.md
 create mode 100644 samples/browser-harness-webscraping/helpers/__init__.py
 create mode 100644 samples/browser-harness-webscraping/helpers/live_view_watcher.py
 create mode 100644 samples/browser-harness-webscraping/parallel_webscraping.ipynb
 create mode 100644 samples/browser-harness-webscraping/requirements.txt

diff --git a/samples/browser-harness-webscraping/.env.template b/samples/browser-harness-webscraping/.env.template
new file mode 100644
index 0000000..3474eb0
--- /dev/null
+++ b/samples/browser-harness-webscraping/.env.template
@@ -0,0 +1,8 @@
+# Azure Playwright Workspaces
+SUBSCRIPTION_ID=<your-azure-subscription-id>
+RESOURCE_GROUP=<your-resource-group>
+LOCATION=eastus
+PLAYWRIGHT_WORKSPACE_NAME=<your-pww-workspace-name>
+
+# This gets set automatically after PWW workspace creation (Step 2 in notebook)
+# BU_CDP_WS=wss://browser.playwright.microsoft.com/ws?...
diff --git a/samples/browser-harness-webscraping/README.md b/samples/browser-harness-webscraping/README.md
new file mode 100644
index 0000000..9a2d5b9
--- /dev/null
+++ b/samples/browser-harness-webscraping/README.md
@@ -0,0 +1,110 @@
+# Parallel Web Scraping with Browser-Harness + Playwright Workspaces
+
+This sample demonstrates how to use [browser-harness](https://github.com/browser-use/browser-harness) with [Playwright Workspaces (PWW)](https://aka.ms/pww/docs) to run 10+ parallel remote browser sessions for web scraping, with LiveView for real-time debuggability.
+
+## Overview
+
+When you need to scrape data from many pages simultaneously — product prices, inventory levels, competitor catalogs — you need parallel browser sessions. This sample shows how to:
+
+1. **Create a Playwright Workspace** — managed cloud browsers on Azure
+2. **Connect browser-harness** to PWW's remote CDP endpoint
+3. **Spawn 10+ parallel browser sessions** — each with its own isolated browser
+4. **Scrape product data** from multiple pages concurrently
+5. **Debug in real-time** using PWW's LiveView
+
+## Architecture
+
+```
+┌─────────────────┐     ┌───────────────────────────┐
+│  Coding Agent   │     │  Playwright Workspaces    │
+│  (Claude Code / │────▶│  (Azure-managed browsers) │
+│   Codex)        │ CDP │                           │
+│                 │ WSS │  ┌───────┐ ┌───────┐     │
+│  browser-harness│────▶│  │ Tab 1 │ │ Tab 2 │ ... │
+└─────────────────┘     │  └───────┘ └───────┘     │
+        │               └───────────────────────────┘
+        │                           │
+        ▼                           ▼
+┌─────────────────┐     ┌───────────────────────────┐
+│  Aggregated     │     │  LiveView (real-time)     │
+│  Scraped Data   │     │  Watch any session live   │
+└─────────────────┘     └───────────────────────────┘
+```
+
+## Prerequisites
+
+- **Azure subscription** with permissions to create Playwright Workspaces
+- **Python 3.10+**
+- **Git** installed
+- **Azure CLI** authenticated (`az login`)
+- Familiarity with Jupyter notebooks
+
+## Quick Start
+
+### 1. Install Dependencies
+
+```bash
+pip install -r requirements.txt
+```
+
+### 2. Install Browser-Harness
+
+```bash
+git clone https://github.com/browser-use/browser-harness
+cd browser-harness
+uv tool install -e .
+```
+
+### 3. Set Up Environment Variables
+
+Copy `.env.template` to `.env` and fill in your values:
+
+```bash
+cp .env.template .env
+```
+
+Required variables:
+```
+SUBSCRIPTION_ID=<your-azure-subscription-id>
+RESOURCE_GROUP=<your-resource-group>
+LOCATION=eastus
+PLAYWRIGHT_WORKSPACE_NAME=<your-workspace-name>
+```
+
+### 4. Run the Notebook
+
+Open `parallel_webscraping.ipynb` and follow the step-by-step instructions.
+
+## What You'll Learn
+
+- How to create and manage Playwright Workspaces programmatically
+- How to connect browser-harness to remote CDP endpoints (PWW)
+- The two-step connection flow (HTTP GET → resolve `sessionUrl` → set `BU_CDP_WS`)
+- How to run 10+ parallel browser sessions for scraping
+- How to use LiveView for real-time debugging of remote browser sessions
+
+## Files in This Sample
+
+| File | Description |
+|------|-------------|
+| `README.md` | This file |
+| `requirements.txt` | Python dependencies |
+| `.env.template` | Environment variable template |
+| `parallel_webscraping.ipynb` | Step-by-step notebook |
+| `helpers/live_view_watcher.py` | LiveView session watcher utility |
+
+## Important Notes
+
+- **Do NOT restart the daemon** after connecting to PWW — the remote browser is destroyed when the WebSocket closes
+- **Cold start latency**: The initial browser provisioning takes 30-90 seconds
+- **Session lifetime**: The browser stays alive as long as the daemon holds the WebSocket connection
+- **Connect immediately**: After resolving the `sessionUrl`, connect the daemon right away — the session URL is ephemeral and expires quickly
+- **Token limits**: PWW workspaces have a maximum number of access tokens. Delete unused tokens before creating new ones
+- **CLI usage**: On Windows, browser-harness requires the `-c` flag: `browser-harness -c "print(page_info())"`
+- The scraping target (`books.toscrape.com`) is a public demo site designed for scraping practice
+
+## More Resources
+
+- [Playwright Workspaces Documentation](https://aka.ms/pww/docs)
+- [Browser-Harness GitHub](https://github.com/browser-use/browser-harness)
+- [PWW Pricing](https://aka.ms/pww/pricing)
diff --git a/samples/browser-harness-webscraping/helpers/__init__.py b/samples/browser-harness-webscraping/helpers/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/samples/browser-harness-webscraping/helpers/live_view_watcher.py b/samples/browser-harness-webscraping/helpers/live_view_watcher.py
new file mode 100644
index 0000000..f4aec7f
--- /dev/null
+++ b/samples/browser-harness-webscraping/helpers/live_view_watcher.py
@@ -0,0 +1,107 @@
+"""
+LiveViewWatcher — polls Playwright Workspaces for new browser sessions
+and auto-opens the LiveView URL for real-time debugging.
+
+Usage:
+    from helpers.live_view_watcher import LiveViewWatcher
+
+    watcher = LiveViewWatcher(pw_client, workspace_id, credential, auth_token)
+    watcher.start()
+    # ... run your browser automation ...
+    watcher.stop()
+"""
+
+import threading
+import webbrowser
+from urllib.parse import quote
+
+
+class LiveViewWatcher:
+    """Polls Playwright Service for new browser sessions and
+    auto-opens the live viewer when one is detected."""
+
+    LIVE_VIEW_BASE_URL = "https://stcnttestdataknarayasea.z23.web.core.windows.net/live_viewer_pww.html"
+
+    def __init__(self, pw_client, workspace_id, credential, auth_token,
+                 auth_service_base=None, poll_interval=2):
+        """
+        Args:
+            pw_client: PlaywrightClient instance
+            workspace_id: PWW workspace ID
+            credential: Azure credential (for future token refresh)
+            auth_token: JWT access token for the live viewer
+            auth_service_base: Base URL of the auth service (derived from dataplane_uri)
+            poll_interval: Seconds between polling attempts
+        """
+        self.pw_client = pw_client
+        self.workspace_id = workspace_id
+        self.credential = credential
+        self.auth_token = auth_token
+        self.auth_service_base = auth_service_base or ""
+        self.poll_interval = poll_interval
+        self.stop_event = threading.Event()
+        self.session_id = None
+        self.thread = None
+        self.existing_sessions = set()
+
+    def _build_live_url(self, session_id):
+        """Construct the PWW live viewer URL with all required params."""
+        return (
+            f"{self.LIVE_VIEW_BASE_URL}"
+            f"?session={quote(session_id)}"
+            f"&workspace={quote(self.workspace_id)}"
+            f"&authBase={quote(self.auth_service_base)}"
+            f"&token={quote(self.auth_token)}"
+        )
+
+    def start(self):
+        """Snapshot existing sessions and start polling in background."""
+        try:
+            self.existing_sessions = set(
+                s.id for s in self.pw_client.browser_sessions.list(self.workspace_id)
+            )
+        except Exception:
+            self.existing_sessions = set()
+        self.stop_event.clear()
+        self.session_id = None
+        self.thread = threading.Thread(target=self._poll, daemon=True)
+        self.thread.start()
+
+    def stop(self):
+        """Signal stop, wait briefly for the session to appear."""
+        self.stop_event.set()
+        if self.thread:
+            self.thread.join(timeout=10)
+
+    def _poll(self):
+        while True:
+            try:
+                current = set(
+                    s.id for s in self.pw_client.browser_sessions.list(self.workspace_id)
+                )
+                new_sessions = current - self.existing_sessions
+                if new_sessions:
+                    self.session_id = new_sessions.pop()
+                    live_url = self._build_live_url(self.session_id)
+                    print(f"\n  [LiveView] Session detected: {self.session_id}")
+                    print(f"  [LiveView] Opening browser...")
+                    webbrowser.open(live_url)
+                    return
+            except Exception:
+                pass
+            if self.stop_event.wait(self.poll_interval):
+                # Final check before exiting
+                try:
+                    current = set(
+                        s.id for s in self.pw_client.browser_sessions.list(self.workspace_id)
+                    )
+                    new_sessions = current - self.existing_sessions
+                    if new_sessions:
+                        self.session_id = new_sessions.pop()
+                        live_url = self._build_live_url(self.session_id)
+                        print(f"\n  [LiveView] Session detected: {self.session_id}")
+                        print(f"  [LiveView] Opening browser...")
+                        webbrowser.open(live_url)
+                except Exception:
+                    pass
+                return
diff --git a/samples/browser-harness-webscraping/parallel_webscraping.ipynb b/samples/browser-harness-webscraping/parallel_webscraping.ipynb
new file mode 100644
index 0000000..a13a63a
--- /dev/null
+++ b/samples/browser-harness-webscraping/parallel_webscraping.ipynb
@@ -0,0 +1,455 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Parallel Web Scraping with Browser-Harness + Playwright Workspaces\n",
+    "\n",
+    "This notebook demonstrates how to:\n",
+    "1. Create a Playwright Workspace (PWW) on Azure\n",
+    "2. Connect browser-harness to the PWW remote CDP endpoint\n",
+    "3. Spawn 10+ parallel browser sessions for web scraping\n",
+    "4. Use LiveView for real-time debuggability\n",
+    "\n",
+    "**Target**: Scrape product data from [books.toscrape.com](http://books.toscrape.com) across multiple category pages simultaneously."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Section 1: Prerequisites & Setup\n",
+    "\n",
+    "Ensure you have:\n",
+    "- Azure CLI authenticated (`az login`)\n",
+    "- browser-harness installed (`git clone https://github.com/browser-use/browser-harness && cd browser-harness && uv tool install -e .`)\n",
+    "- Dependencies installed (`pip install -r requirements.txt`)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import json\n",
+    "import uuid\n",
+    "import subprocess\n",
+    "from datetime import datetime, timedelta, timezone\n",
+    "from urllib.parse import urlparse\n",
+    "from concurrent.futures import ThreadPoolExecutor, as_completed\n",
+    "\n",
+    "import pandas as pd\n",
+    "from dotenv import load_dotenv\n",
+    "from azure.identity import DefaultAzureCredential\n",
+    "from azure.mgmt.playwright import PlaywrightMgmtClient\n",
+    "from azure.mgmt.playwright.models import PlaywrightWorkspace, PlaywrightWorkspaceProperties\n",
+    "from azure.developer.playwright import PlaywrightClient\n",
+    "\n",
+    "load_dotenv()\n",
+    "\n",
+    "# Configuration\n",
+    "SUBSCRIPTION_ID = os.environ[\"SUBSCRIPTION_ID\"]\n",
+    "RESOURCE_GROUP = os.environ[\"RESOURCE_GROUP\"]\n",
+    "LOCATION = os.environ.get(\"LOCATION\", \"eastus\")\n",
+    "PLAYWRIGHT_WORKSPACE_NAME = os.environ[\"PLAYWRIGHT_WORKSPACE_NAME\"]\n",
+    "\n",
+    "credential = DefaultAzureCredential()\n",
+    "print(\"✅ Configuration loaded\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Section 2: Create Playwright Workspace (PWW)\n",
+    "\n",
+    "This creates a managed Playwright Workspace on Azure that provides cloud-hosted browsers.\n",
+    "Skip this cell if your workspace already exists."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create or get the Playwright Workspace\n",
+    "pw_mgmt = PlaywrightMgmtClient(credential, SUBSCRIPTION_ID)\n",
+    "\n",
+    "print(f\"Creating Playwright Workspace: {PLAYWRIGHT_WORKSPACE_NAME}...\")\n",
+    "workspace = pw_mgmt.playwright_workspaces.begin_create_or_update(\n",
+    "    resource_group_name=RESOURCE_GROUP,\n",
+    "    playwright_workspace_name=PLAYWRIGHT_WORKSPACE_NAME,\n",
+    "    resource=PlaywrightWorkspace(\n",
+    "        location=LOCATION,\n",
+    "        properties=PlaywrightWorkspaceProperties(local_auth=\"Enabled\"),\n",
+    "    ),\n",
+    ").result()\n",
+    "\n",
+    "workspace_id = workspace.properties.workspace_id\n",
+    "dataplane_uri = workspace.properties.dataplane_uri\n",
+    "base_url = f\"{urlparse(dataplane_uri).scheme}://{urlparse(dataplane_uri).netloc}\"\n",
+    "\n",
+    "print(f\"✅ Workspace ready: {workspace_id}\")\n",
+    "print(f\"   Dataplane: {base_url}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create an access token for the workspace\n",
+    "pw_client = PlaywrightClient(endpoint=base_url, credential=credential)\n",
+    "\n",
+    "access_token_id = str(uuid.uuid4())\n",
+    "token = pw_client.access_tokens.create_or_replace(\n",
+    "    workspace_id=workspace_id,\n",
+    "    access_token_id=access_token_id,\n",
+    "    resource={\n",
+    "        \"name\": f\"scraping-demo-{access_token_id[:8]}\",\n",
+    "        \"expiryAt\": (datetime.now(timezone.utc) + timedelta(days=30)).isoformat()\n",
+    "    }\n",
+    ")\n",
+    "\n",
+    "playwright_api_key = token.jwt_token\n",
+    "print(\"✅ Access token created (valid 30 days)\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Section 3: Connect Browser-Harness to PWW Remote Endpoint\n",
+    "\n",
+    "### The Connection Prompt\n",
+    "\n",
+    "Paste the following prompt into your coding agent (Claude Code / Codex) to connect browser-harness to the PWW remote browser:\n",
+    "\n",
+    "---\n",
+    "\n",
+    "```text\n",
+    "Set up browser-harness to connect to my Playwright Workspaces remote browser.\n",
+    "\n",
+    "Read install.md and SKILL.md first. Then connect to this Azure Playwright Service endpoint:\n",
+    "\n",
+    "  SERVICE_URL=<paste your PWW service URL here>\n",
+    "\n",
+    "Follow the two-step connection flow:\n",
+    "1. HTTP GET the SERVICE_URL (allow 60-90s for the browser to spin up). Parse the JSON response to extract the `sessionUrl` (a wss:// WebSocket URL).\n",
+    "2. Set BU_CDP_WS to the resolved sessionUrl in .env, then restart the daemon ONCE.\n",
+    "\n",
+    "IMPORTANT:\n",
+    "- Do NOT kill or restart the daemon after the session is connected — the remote browser is destroyed when the WebSocket closes.\n",
+    "- Do NOT set shouldRedirect=true; use shouldRedirect=false and manually resolve the sessionUrl.\n",
+    "- The cold start takes 30-90s. Use a generous timeout on the initial HTTP GET.\n",
+    "- After connecting, verify with: browser-harness <<'PY'\\nprint(page_info())\\nPY\n",
+    "\n",
+    "Once connected, confirm with a screenshot that the remote browser is alive.\n",
+    "```\n",
+    "\n",
+    "---\n",
+    "\n",
+    "### Programmatic Connection (alternative)\n",
+    "\n",
+    "If you prefer to connect programmatically instead of via the coding agent prompt:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import urllib.request\n",
+    "\n",
+    "# Build the PWW service URL\n",
+    "service_url = (\n",
+    "    f\"https://{urlparse(dataplane_uri).netloc}\"\n",
+    "    f\"/playwrightworkspaces/{workspace_id}/browsers\"\n",
+    "    f\"?playwrightVersion=cdp&shouldRedirect=false\"\n",
+    "    f\"&accessKey={playwright_api_key}\"\n",
+    ")\n",
+    "\n",
+    "print(\"Resolving remote browser session (30-90s cold start)...\")\n",
+    "\n",
+    "# Step 1: HTTP GET to provision the browser and get the CDP WebSocket URL\n",
+    "resp = urllib.request.urlopen(service_url, timeout=120)\n",
+    "data = json.loads(resp.read())\n",
+    "cdp_ws_url = data[\"sessionUrl\"]\n",
+    "\n",
+    "print(f\"✅ Remote browser provisioned\")\n",
+    "print(f\"   CDP WebSocket: {cdp_ws_url[:80]}...\")\n",
+    "\n",
+    "# Step 2: Write to .env so browser-harness picks it up\n",
+    "env_path = os.path.join(os.path.dirname(os.path.abspath('.')), 'browser-harness', '.env')\n",
+    "# Or set it directly for this session:\n",
+    "os.environ[\"BU_CDP_WS\"] = cdp_ws_url\n",
+    "\n",
+    "print(\"\\n⚠️  IMPORTANT: Do NOT restart the daemon after this point.\")\n",
+    "print(\"   The remote browser is destroyed when the WebSocket closes.\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Verify connection\n",
+    "# NOTE: browser-harness uses -c flag for script execution\n",
+    "result = subprocess.run(\n",
+    "    [\"browser-harness\", \"-c\", \"print(page_info())\"],\n",
+    "    capture_output=True, text=True, timeout=30,\n",
+    "    env={**os.environ, \"BU_CDP_WS\": cdp_ws_url}\n",
+    ")\n",
+    "print(result.stdout)\n",
+    "if result.returncode == 0:\n",
+    "    print(\"✅ Browser-harness connected to PWW remote browser\")\n",
+    "else:\n",
+    "    print(f\"❌ Connection failed: {result.stderr}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Section 4: Parallel Web Scraping (10+ Sessions)\n",
+    "\n",
+    "We'll scrape product data from [books.toscrape.com](http://books.toscrape.com) — a public demo site designed for scraping practice.\n",
+    "\n",
+    "Each parallel session scrapes a different category page."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Define the pages to scrape (one per parallel browser session)\n",
+    "CATEGORY_URLS = [\n",
+    "    \"http://books.toscrape.com/catalogue/category/books/travel_2/index.html\",\n",
+    "    \"http://books.toscrape.com/catalogue/category/books/mystery_3/index.html\",\n",
+    "    \"http://books.toscrape.com/catalogue/category/books/historical-fiction_4/index.html\",\n",
+    "    \"http://books.toscrape.com/catalogue/category/books/sequential-art_5/index.html\",\n",
+    "    \"http://books.toscrape.com/catalogue/category/books/classics_6/index.html\",\n",
+    "    \"http://books.toscrape.com/catalogue/category/books/philosophy_7/index.html\",\n",
+    "    \"http://books.toscrape.com/catalogue/category/books/romance_8/index.html\",\n",
+    "    \"http://books.toscrape.com/catalogue/category/books/womens-fiction_9/index.html\",\n",
+    "    \"http://books.toscrape.com/catalogue/category/books/fiction_10/index.html\",\n",
+    "    \"http://books.toscrape.com/catalogue/category/books/childrens_11/index.html\",\n",
+    "    \"http://books.toscrape.com/catalogue/category/books/religion_12/index.html\",\n",
+    "    \"http://books.toscrape.com/catalogue/category/books/nonfiction_13/index.html\",\n",
+    "]\n",
+    "\n",
+    "print(f\"Will scrape {len(CATEGORY_URLS)} category pages in parallel\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Scraping script template for each parallel browser session\n",
+    "# browser-harness uses -c flag: browser-harness -c \"<script>\"\n",
+    "SCRAPE_SCRIPT = (\n",
+    "    'import json\\n'\n",
+    "    'new_tab(\"{url}\")\\n'\n",
+    "    'wait_for_load()\\n'\n",
+    "    'books = js(\"Array.from(document.querySelectorAll(\\'article.product_pod\\')).map(el => ({{'\n",
+    "    'title: el.querySelector(\\'h3 a\\').getAttribute(\\'title\\'),'\n",
+    "    'price: el.querySelector(\\'.price_color\\').textContent,'\n",
+    "    'availability: el.querySelector(\\'.availability\\').textContent.trim(),'\n",
+    "    'rating: el.querySelector(\\'p.star-rating\\').className.replace(\\'star-rating \\', \\'\\')'\n",
+    "    '}}))\"\\n)\\n'\n",
+    "    'category = js(\"document.querySelector(\\'.page-header h1\\').textContent\")\\n'\n",
+    "    'print(json.dumps({{\"category\": category, \"books\": books}}))'\n",
+    ")\n",
+    "\n",
+    "\n",
+    "def scrape_category(url, session_name):\n",
+    "    \"\"\"Scrape a single category page using a dedicated browser session.\n",
+    "    \n",
+    "    Each session gets its own remote browser via a distinct BU_NAME.\n",
+    "    The daemon must connect immediately after provisioning — the session\n",
+    "    URL is ephemeral and expires quickly.\n",
+    "    \"\"\"\n",
+    "    script = SCRAPE_SCRIPT.format(url=url)\n",
+    "    result = subprocess.run(\n",
+    "        [\"browser-harness\", \"-c\", script],\n",
+    "        capture_output=True, text=True, timeout=60,\n",
+    "        env={**os.environ, \"BU_NAME\": session_name, \"BU_CDP_WS\": cdp_ws_url}\n",
+    "    )\n",
+    "    if result.returncode == 0:\n",
+    "        # Parse the JSON output from the last print statement\n",
+    "        for line in result.stdout.strip().split('\\n'):\n",
+    "            try:\n",
+    "                return json.loads(line)\n",
+    "            except json.JSONDecodeError:\n",
+    "                continue\n",
+    "    return {\"error\": result.stderr, \"url\": url}\n",
+    "\n",
+    "\n",
+    "print(\"Ready to scrape. Running parallel sessions...\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Execute parallel scraping across all categories\n",
+    "all_results = []\n",
+    "\n",
+    "with ThreadPoolExecutor(max_workers=12) as executor:\n",
+    "    futures = {\n",
+    "        executor.submit(scrape_category, url, f\"scraper-{i}\"): url\n",
+    "        for i, url in enumerate(CATEGORY_URLS)\n",
+    "    }\n",
+    "    \n",
+    "    for future in as_completed(futures):\n",
+    "        url = futures[future]\n",
+    "        try:\n",
+    "            result = future.result()\n",
+    "            if \"error\" not in result:\n",
+    "                all_results.append(result)\n",
+    "                print(f\"✅ {result['category']}: {len(result['books'])} books\")\n",
+    "            else:\n",
+    "                print(f\"❌ {url}: {result['error'][:100]}\")\n",
+    "        except Exception as e:\n",
+    "            print(f\"❌ {url}: {e}\")\n",
+    "\n",
+    "print(f\"\\n📊 Scraped {sum(len(r['books']) for r in all_results)} books from {len(all_results)} categories\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Aggregate into a DataFrame\n",
+    "rows = []\n",
+    "for result in all_results:\n",
+    "    for book in result[\"books\"]:\n",
+    "        rows.append({\n",
+    "            \"category\": result[\"category\"],\n",
+    "            \"title\": book[\"title\"],\n",
+    "            \"price\": book[\"price\"],\n",
+    "            \"availability\": book[\"availability\"],\n",
+    "            \"rating\": book[\"rating\"],\n",
+    "        })\n",
+    "\n",
+    "df = pd.DataFrame(rows)\n",
+    "print(f\"Total books scraped: {len(df)}\")\n",
+    "df.head(15)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Section 5: LiveView for Debuggability\n",
+    "\n",
+    "LiveView lets you watch any browser session in real-time. The `LiveViewWatcher` polls for new sessions and auto-opens the viewer in your browser."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from helpers.live_view_watcher import LiveViewWatcher\n",
+    "\n",
+    "# Initialize the watcher\n",
+    "live_watcher = LiveViewWatcher(\n",
+    "    pw_client=pw_client,\n",
+    "    workspace_id=workspace_id,\n",
+    "    credential=credential,\n",
+    "    auth_token=playwright_api_key,\n",
+    "    auth_service_base=base_url,\n",
+    ")\n",
+    "\n",
+    "# Start watching — when the next browser session is created, LiveView opens automatically\n",
+    "live_watcher.start()\n",
+    "print(\"👀 LiveView watcher active — will open viewer when a new session starts\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Run a single scraping task to trigger LiveView\n",
+    "demo_result = scrape_category(CATEGORY_URLS[0], \"live-demo\")\n",
+    "print(f\"Scraped: {demo_result.get('category', 'unknown')}\")\n",
+    "\n",
+    "live_watcher.stop()\n",
+    "if live_watcher.session_id:\n",
+    "    print(f\"\\n✅ LiveView opened for session: {live_watcher.session_id}\")\n",
+    "else:\n",
+    "    print(\"\\nℹ️  No new session detected (session may have reused an existing one)\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Section 6: Cleanup\n",
+    "\n",
+    "Stop browser sessions and optionally delete the workspace."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# List active sessions\n",
+    "sessions = list(pw_client.browser_sessions.list(workspace_id))\n",
+    "print(f\"Active sessions: {len(sessions)}\")\n",
+    "for s in sessions:\n",
+    "    print(f\"  - {s.id} | {s.browser_type} | {s.status}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Optional: Delete the workspace (uncomment to run)\n",
+    "# pw_mgmt.playwright_workspaces.begin_delete(\n",
+    "#     resource_group_name=RESOURCE_GROUP,\n",
+    "#     playwright_workspace_name=PLAYWRIGHT_WORKSPACE_NAME,\n",
+    "# ).result()\n",
+    "# print(\"✅ Workspace deleted\")\n",
+    "\n",
+    "print(\"Done! Your scraped data is in the 'df' DataFrame above.\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.10.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/samples/browser-harness-webscraping/requirements.txt b/samples/browser-harness-webscraping/requirements.txt
new file mode 100644
index 0000000..f0257ae
--- /dev/null
+++ b/samples/browser-harness-webscraping/requirements.txt
@@ -0,0 +1,6 @@
+azure-identity
+azure-mgmt-playwright
+azure-developer-playwright
+jupyter
+pandas
+python-dotenv

From 3c85cf826b3ec16374ffaf0205d126d9ff5d720a Mon Sep 17 00:00:00 2001
From: Mitesh Shah <58204159+mitsha-microsoft@users.noreply.github.com>
Date: Fri, 8 May 2026 12:28:03 +0000
Subject: [PATCH 2/4] Reworked the sample

---
 .../browser-harness-webscraping/.env.template |  12 +-
 samples/browser-harness-webscraping/README.md | 137 +++---
 .../helpers/__init__.py                       |   0
 .../helpers/live_view_watcher.py              | 107 ----
 .../parallel_webscraping.ipynb                | 455 ------------------
 .../playwright_service_client.py              |  95 ++++
 .../requirements.txt                          |   5 -
 7 files changed, 174 insertions(+), 637 deletions(-)
 delete mode 100644 samples/browser-harness-webscraping/helpers/__init__.py
 delete mode 100644 samples/browser-harness-webscraping/helpers/live_view_watcher.py
 delete mode 100644 samples/browser-harness-webscraping/parallel_webscraping.ipynb
 create mode 100644 samples/browser-harness-webscraping/playwright_service_client.py

diff --git a/samples/browser-harness-webscraping/.env.template b/samples/browser-harness-webscraping/.env.template
index 3474eb0..7b9c02c 100644
--- a/samples/browser-harness-webscraping/.env.template
+++ b/samples/browser-harness-webscraping/.env.template
@@ -1,8 +1,6 @@
-# Azure Playwright Workspaces
-SUBSCRIPTION_ID=<your-azure-subscription-id>
-RESOURCE_GROUP=<your-resource-group>
-LOCATION=eastus
-PLAYWRIGHT_WORKSPACE_NAME=<your-pww-workspace-name>
+# Microsoft Playwright Service - Environment Variables
+# Copy this file to .env and fill in your values
 
-# This gets set automatically after PWW workspace creation (Step 2 in notebook)
-# BU_CDP_WS=wss://browser.playwright.microsoft.com/ws?...
+# Playwright Service (Required for all samples)
+PLAYWRIGHT_SERVICE_URL=
+PLAYWRIGHT_SERVICE_ACCESS_TOKEN=
\ No newline at end of file
diff --git a/samples/browser-harness-webscraping/README.md b/samples/browser-harness-webscraping/README.md
index 9a2d5b9..8818c7d 100644
--- a/samples/browser-harness-webscraping/README.md
+++ b/samples/browser-harness-webscraping/README.md
@@ -6,38 +6,16 @@ This sample demonstrates how to use [browser-harness](https://github.com/browser
 
 When you need to scrape data from many pages simultaneously — product prices, inventory levels, competitor catalogs — you need parallel browser sessions. This sample shows how to:
 
-1. **Create a Playwright Workspace** — managed cloud browsers on Azure
-2. **Connect browser-harness** to PWW's remote CDP endpoint
-3. **Spawn 10+ parallel browser sessions** — each with its own isolated browser
-4. **Scrape product data** from multiple pages concurrently
-5. **Debug in real-time** using PWW's LiveView
-
-## Architecture
-
-```
-┌─────────────────┐     ┌───────────────────────────┐
-│  Coding Agent   │     │  Playwright Workspaces    │
-│  (Claude Code / │────▶│  (Azure-managed browsers) │
-│   Codex)        │ CDP │                           │
-│                 │ WSS │  ┌───────┐ ┌───────┐     │
-│  browser-harness│────▶│  │ Tab 1 │ │ Tab 2 │ ... │
-└─────────────────┘     │  └───────┘ └───────┘     │
-        │               └───────────────────────────┘
-        │                           │
-        ▼                           ▼
-┌─────────────────┐     ┌───────────────────────────┐
-│  Aggregated     │     │  LiveView (real-time)     │
-│  Scraped Data   │     │  Watch any session live   │
-└─────────────────┘     └───────────────────────────┘
-```
+1. **Connect browser-harness** to PWW's remote CDP endpoint
+1. **Spawn 10+ parallel browser sessions** — each with its own isolated browser
+1. **Scrape product data** from multiple pages concurrently
 
 ## Prerequisites
 
 - **Azure subscription** with permissions to create Playwright Workspaces
+- **Playwright Workspace** & a **Playwright Service Access Token**. [Information on how to create a workspace](https://learn.microsoft.com/en-us/azure/app-testing/playwright-workspaces/quickstart-run-end-to-end-tests?tabs=playwrightcli&pivots=playwright-test-runner) and [how to create an access token](https://learn.microsoft.com/en-us/azure/app-testing/playwright-workspaces/how-to-manage-access-tokens)
 - **Python 3.10+**
 - **Git** installed
-- **Azure CLI** authenticated (`az login`)
-- Familiarity with Jupyter notebooks
 
 ## Quick Start
 
@@ -47,15 +25,7 @@ When you need to scrape data from many pages simultaneously — product prices,
 pip install -r requirements.txt
 ```
 
-### 2. Install Browser-Harness
-
-```bash
-git clone https://github.com/browser-use/browser-harness
-cd browser-harness
-uv tool install -e .
-```
-
-### 3. Set Up Environment Variables
+### 2. Set Up Environment Variables
 
 Copy `.env.template` to `.env` and fill in your values:
 
@@ -65,43 +35,84 @@ cp .env.template .env
 
 Required variables:
 ```
-SUBSCRIPTION_ID=<your-azure-subscription-id>
-RESOURCE_GROUP=<your-resource-group>
-LOCATION=eastus
-PLAYWRIGHT_WORKSPACE_NAME=<your-workspace-name>
+PLAYWRIGHT_SERVICE_URL=<playwright-service-url>
+PLAYWRIGHT_SERVICE_ACCESS_TOKEN=<playwright-service-access-token>
 ```
 
-### 4. Run the Notebook
 
-Open `parallel_webscraping.ipynb` and follow the step-by-step instructions.
+### Use the setup prompt to setup browser-harness to connect to Playwright Service Browsers 
+
+In a coding agent of your choice like Codex/Claude Code, use the following prompt:
+
+```text
+Set up https://github.com/browser-use/browser-harness for me.
+
+Read `install.md` and follow the steps to install browser-harness and connect it to my Playwright Workspaces remote browsers.
 
-## What You'll Learn
+Get the SERVICE_URL needed for provisioning remote browsers by running `get_cdp_browsers_endpoint()` method from `playwright_service_client.py`
 
-- How to create and manage Playwright Workspaces programmatically
-- How to connect browser-harness to remote CDP endpoints (PWW)
-- The two-step connection flow (HTTP GET → resolve `sessionUrl` → set `BU_CDP_WS`)
-- How to run 10+ parallel browser sessions for scraping
-- How to use LiveView for real-time debugging of remote browser sessions
+Then update your skill to Follow the two-step connection flow for playwright remote browsers:
 
-## Files in This Sample
+1. HTTP GET the SERVICE_URL (allow 60-90s for the browser to spin up). Parse the JSON response to extract the `sessionUrl` (a wss:// WebSocket URL).
+2. Set BU_CDP_WS to the resolved sessionUrl in .env, then restart the daemon ONCE.
 
-| File | Description |
-|------|-------------|
-| `README.md` | This file |
-| `requirements.txt` | Python dependencies |
-| `.env.template` | Environment variable template |
-| `parallel_webscraping.ipynb` | Step-by-step notebook |
-| `helpers/live_view_watcher.py` | LiveView session watcher utility |
+IMPORTANT:
 
-## Important Notes
+- Do NOT kill or restart the daemon after the session is connected — the remote browser is destroyed when the WebSocket connection closes.
+- Do NOT set shouldRedirect=true; use shouldRedirect=false and manually resolve the sessionUrl.
+- The cold start takes 30-90s. Use a generous timeout on the initial HTTP GET.
+- After connecting, verify with: browser-harness <<'PY'\nprint(page_info())\nPY                                                         
 
-- **Do NOT restart the daemon** after connecting to PWW — the remote browser is destroyed when the WebSocket closes
-- **Cold start latency**: The initial browser provisioning takes 30-90 seconds
-- **Session lifetime**: The browser stays alive as long as the daemon holds the WebSocket connection
-- **Connect immediately**: After resolving the `sessionUrl`, connect the daemon right away — the session URL is ephemeral and expires quickly
-- **Token limits**: PWW workspaces have a maximum number of access tokens. Delete unused tokens before creating new ones
-- **CLI usage**: On Windows, browser-harness requires the `-c` flag: `browser-harness -c "print(page_info())"`
-- The scraping target (`books.toscrape.com`) is a public demo site designed for scraping practice
+Once connected, confirm with a screenshot that the remote browser is alive. 
+```
+
+#### Start scraping with the power of browser-harness and Playwright Remote Browsers
+
+Once this done, you can ask your agent to use browser-harness with playwright remote browsers to perform web scraping. Use a prompt similar to something like this:
+
+```text
+Use browser-harness to scrape this website with 10 parallel remote browsers from Playwright Service.
+ 
+ Target:
+ - Website: <URL>
+ - Information to extract: <INFORMATION HERE>
+ - Output format: <TABLE / JSON / CSV>
+ - Save artifacts to: <FOLDER>
+ 
+ Remote browser requirements:
+ 1. Create 10 independent Playwright Service remote browser sessions.
+ 2. Each sub-agent must use a unique BU_NAME, for example:
+    scrape-worker-01 ... scrape-worker-10
+ 3. Each sub-agent must resolve its own sessionUrl using this two-step flow:
+    - HTTP GET SERVICE_URL with shouldRedirect=false and a 120s timeout.
+    - Parse JSON response.sessionUrl.
+    - Set BU_CDP_WS to that sessionUrl only in the same process environment as browser-harness.
+ 4. Do not write SERVICE_URL, access keys, or resolved WebSocket URLs to disk.
+ 5. Do not use .env for BU_CDP_WS.
+ 7. Close each remote session after scraping.
+ 
+ Get the SERVICE_URL by running `get_cdp_browsers_endpoint()` method from `playwright_service_client.py`
+ 
+ Work splitting:
+ - Decompose the scrape into 10 independent chunks.
+ - Dispatch all 10 workers in parallel.
+ - Each worker should scrape only its assigned chunk.
+ - After all workers complete, merge and deduplicate results.
+ - Validate the final output against the requested count/schema.
+ 
+ For each worker, require this final response:
+ - Assigned chunk
+ - Remote routing proof: BU_NAME used and confirmation that BU_CDP_WS pointed to Playwright Service
+ - Items scraped
+ - Screenshot path
+ - Whether the remote session was closed
+ - Any blockers
+ 
+ Final response:
+ - Return the merged scraped data.
+ - Mention any chunks that failed or were partial.
+ - Confirm all remote sessions were closed.
+```
 
 ## More Resources
 
diff --git a/samples/browser-harness-webscraping/helpers/__init__.py b/samples/browser-harness-webscraping/helpers/__init__.py
deleted file mode 100644
index e69de29..0000000
diff --git a/samples/browser-harness-webscraping/helpers/live_view_watcher.py b/samples/browser-harness-webscraping/helpers/live_view_watcher.py
deleted file mode 100644
index f4aec7f..0000000
--- a/samples/browser-harness-webscraping/helpers/live_view_watcher.py
+++ /dev/null
@@ -1,107 +0,0 @@
-"""
-LiveViewWatcher — polls Playwright Workspaces for new browser sessions
-and auto-opens the LiveView URL for real-time debugging.
-
-Usage:
-    from helpers.live_view_watcher import LiveViewWatcher
-
-    watcher = LiveViewWatcher(pw_client, workspace_id, credential, auth_token)
-    watcher.start()
-    # ... run your browser automation ...
-    watcher.stop()
-"""
-
-import threading
-import webbrowser
-from urllib.parse import quote
-
-
-class LiveViewWatcher:
-    """Polls Playwright Service for new browser sessions and
-    auto-opens the live viewer when one is detected."""
-
-    LIVE_VIEW_BASE_URL = "https://stcnttestdataknarayasea.z23.web.core.windows.net/live_viewer_pww.html"
-
-    def __init__(self, pw_client, workspace_id, credential, auth_token,
-                 auth_service_base=None, poll_interval=2):
-        """
-        Args:
-            pw_client: PlaywrightClient instance
-            workspace_id: PWW workspace ID
-            credential: Azure credential (for future token refresh)
-            auth_token: JWT access token for the live viewer
-            auth_service_base: Base URL of the auth service (derived from dataplane_uri)
-            poll_interval: Seconds between polling attempts
-        """
-        self.pw_client = pw_client
-        self.workspace_id = workspace_id
-        self.credential = credential
-        self.auth_token = auth_token
-        self.auth_service_base = auth_service_base or ""
-        self.poll_interval = poll_interval
-        self.stop_event = threading.Event()
-        self.session_id = None
-        self.thread = None
-        self.existing_sessions = set()
-
-    def _build_live_url(self, session_id):
-        """Construct the PWW live viewer URL with all required params."""
-        return (
-            f"{self.LIVE_VIEW_BASE_URL}"
-            f"?session={quote(session_id)}"
-            f"&workspace={quote(self.workspace_id)}"
-            f"&authBase={quote(self.auth_service_base)}"
-            f"&token={quote(self.auth_token)}"
-        )
-
-    def start(self):
-        """Snapshot existing sessions and start polling in background."""
-        try:
-            self.existing_sessions = set(
-                s.id for s in self.pw_client.browser_sessions.list(self.workspace_id)
-            )
-        except Exception:
-            self.existing_sessions = set()
-        self.stop_event.clear()
-        self.session_id = None
-        self.thread = threading.Thread(target=self._poll, daemon=True)
-        self.thread.start()
-
-    def stop(self):
-        """Signal stop, wait briefly for the session to appear."""
-        self.stop_event.set()
-        if self.thread:
-            self.thread.join(timeout=10)
-
-    def _poll(self):
-        while True:
-            try:
-                current = set(
-                    s.id for s in self.pw_client.browser_sessions.list(self.workspace_id)
-                )
-                new_sessions = current - self.existing_sessions
-                if new_sessions:
-                    self.session_id = new_sessions.pop()
-                    live_url = self._build_live_url(self.session_id)
-                    print(f"\n  [LiveView] Session detected: {self.session_id}")
-                    print(f"  [LiveView] Opening browser...")
-                    webbrowser.open(live_url)
-                    return
-            except Exception:
-                pass
-            if self.stop_event.wait(self.poll_interval):
-                # Final check before exiting
-                try:
-                    current = set(
-                        s.id for s in self.pw_client.browser_sessions.list(self.workspace_id)
-                    )
-                    new_sessions = current - self.existing_sessions
-                    if new_sessions:
-                        self.session_id = new_sessions.pop()
-                        live_url = self._build_live_url(self.session_id)
-                        print(f"\n  [LiveView] Session detected: {self.session_id}")
-                        print(f"  [LiveView] Opening browser...")
-                        webbrowser.open(live_url)
-                except Exception:
-                    pass
-                return
diff --git a/samples/browser-harness-webscraping/parallel_webscraping.ipynb b/samples/browser-harness-webscraping/parallel_webscraping.ipynb
deleted file mode 100644
index a13a63a..0000000
--- a/samples/browser-harness-webscraping/parallel_webscraping.ipynb
+++ /dev/null
@@ -1,455 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Parallel Web Scraping with Browser-Harness + Playwright Workspaces\n",
-    "\n",
-    "This notebook demonstrates how to:\n",
-    "1. Create a Playwright Workspace (PWW) on Azure\n",
-    "2. Connect browser-harness to the PWW remote CDP endpoint\n",
-    "3. Spawn 10+ parallel browser sessions for web scraping\n",
-    "4. Use LiveView for real-time debuggability\n",
-    "\n",
-    "**Target**: Scrape product data from [books.toscrape.com](http://books.toscrape.com) across multiple category pages simultaneously."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Section 1: Prerequisites & Setup\n",
-    "\n",
-    "Ensure you have:\n",
-    "- Azure CLI authenticated (`az login`)\n",
-    "- browser-harness installed (`git clone https://github.com/browser-use/browser-harness && cd browser-harness && uv tool install -e .`)\n",
-    "- Dependencies installed (`pip install -r requirements.txt`)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "import json\n",
-    "import uuid\n",
-    "import subprocess\n",
-    "from datetime import datetime, timedelta, timezone\n",
-    "from urllib.parse import urlparse\n",
-    "from concurrent.futures import ThreadPoolExecutor, as_completed\n",
-    "\n",
-    "import pandas as pd\n",
-    "from dotenv import load_dotenv\n",
-    "from azure.identity import DefaultAzureCredential\n",
-    "from azure.mgmt.playwright import PlaywrightMgmtClient\n",
-    "from azure.mgmt.playwright.models import PlaywrightWorkspace, PlaywrightWorkspaceProperties\n",
-    "from azure.developer.playwright import PlaywrightClient\n",
-    "\n",
-    "load_dotenv()\n",
-    "\n",
-    "# Configuration\n",
-    "SUBSCRIPTION_ID = os.environ[\"SUBSCRIPTION_ID\"]\n",
-    "RESOURCE_GROUP = os.environ[\"RESOURCE_GROUP\"]\n",
-    "LOCATION = os.environ.get(\"LOCATION\", \"eastus\")\n",
-    "PLAYWRIGHT_WORKSPACE_NAME = os.environ[\"PLAYWRIGHT_WORKSPACE_NAME\"]\n",
-    "\n",
-    "credential = DefaultAzureCredential()\n",
-    "print(\"✅ Configuration loaded\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Section 2: Create Playwright Workspace (PWW)\n",
-    "\n",
-    "This creates a managed Playwright Workspace on Azure that provides cloud-hosted browsers.\n",
-    "Skip this cell if your workspace already exists."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Create or get the Playwright Workspace\n",
-    "pw_mgmt = PlaywrightMgmtClient(credential, SUBSCRIPTION_ID)\n",
-    "\n",
-    "print(f\"Creating Playwright Workspace: {PLAYWRIGHT_WORKSPACE_NAME}...\")\n",
-    "workspace = pw_mgmt.playwright_workspaces.begin_create_or_update(\n",
-    "    resource_group_name=RESOURCE_GROUP,\n",
-    "    playwright_workspace_name=PLAYWRIGHT_WORKSPACE_NAME,\n",
-    "    resource=PlaywrightWorkspace(\n",
-    "        location=LOCATION,\n",
-    "        properties=PlaywrightWorkspaceProperties(local_auth=\"Enabled\"),\n",
-    "    ),\n",
-    ").result()\n",
-    "\n",
-    "workspace_id = workspace.properties.workspace_id\n",
-    "dataplane_uri = workspace.properties.dataplane_uri\n",
-    "base_url = f\"{urlparse(dataplane_uri).scheme}://{urlparse(dataplane_uri).netloc}\"\n",
-    "\n",
-    "print(f\"✅ Workspace ready: {workspace_id}\")\n",
-    "print(f\"   Dataplane: {base_url}\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Create an access token for the workspace\n",
-    "pw_client = PlaywrightClient(endpoint=base_url, credential=credential)\n",
-    "\n",
-    "access_token_id = str(uuid.uuid4())\n",
-    "token = pw_client.access_tokens.create_or_replace(\n",
-    "    workspace_id=workspace_id,\n",
-    "    access_token_id=access_token_id,\n",
-    "    resource={\n",
-    "        \"name\": f\"scraping-demo-{access_token_id[:8]}\",\n",
-    "        \"expiryAt\": (datetime.now(timezone.utc) + timedelta(days=30)).isoformat()\n",
-    "    }\n",
-    ")\n",
-    "\n",
-    "playwright_api_key = token.jwt_token\n",
-    "print(\"✅ Access token created (valid 30 days)\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Section 3: Connect Browser-Harness to PWW Remote Endpoint\n",
-    "\n",
-    "### The Connection Prompt\n",
-    "\n",
-    "Paste the following prompt into your coding agent (Claude Code / Codex) to connect browser-harness to the PWW remote browser:\n",
-    "\n",
-    "---\n",
-    "\n",
-    "```text\n",
-    "Set up browser-harness to connect to my Playwright Workspaces remote browser.\n",
-    "\n",
-    "Read install.md and SKILL.md first. Then connect to this Azure Playwright Service endpoint:\n",
-    "\n",
-    "  SERVICE_URL=<paste your PWW service URL here>\n",
-    "\n",
-    "Follow the two-step connection flow:\n",
-    "1. HTTP GET the SERVICE_URL (allow 60-90s for the browser to spin up). Parse the JSON response to extract the `sessionUrl` (a wss:// WebSocket URL).\n",
-    "2. Set BU_CDP_WS to the resolved sessionUrl in .env, then restart the daemon ONCE.\n",
-    "\n",
-    "IMPORTANT:\n",
-    "- Do NOT kill or restart the daemon after the session is connected — the remote browser is destroyed when the WebSocket closes.\n",
-    "- Do NOT set shouldRedirect=true; use shouldRedirect=false and manually resolve the sessionUrl.\n",
-    "- The cold start takes 30-90s. Use a generous timeout on the initial HTTP GET.\n",
-    "- After connecting, verify with: browser-harness <<'PY'\\nprint(page_info())\\nPY\n",
-    "\n",
-    "Once connected, confirm with a screenshot that the remote browser is alive.\n",
-    "```\n",
-    "\n",
-    "---\n",
-    "\n",
-    "### Programmatic Connection (alternative)\n",
-    "\n",
-    "If you prefer to connect programmatically instead of via the coding agent prompt:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import urllib.request\n",
-    "\n",
-    "# Build the PWW service URL\n",
-    "service_url = (\n",
-    "    f\"https://{urlparse(dataplane_uri).netloc}\"\n",
-    "    f\"/playwrightworkspaces/{workspace_id}/browsers\"\n",
-    "    f\"?playwrightVersion=cdp&shouldRedirect=false\"\n",
-    "    f\"&accessKey={playwright_api_key}\"\n",
-    ")\n",
-    "\n",
-    "print(\"Resolving remote browser session (30-90s cold start)...\")\n",
-    "\n",
-    "# Step 1: HTTP GET to provision the browser and get the CDP WebSocket URL\n",
-    "resp = urllib.request.urlopen(service_url, timeout=120)\n",
-    "data = json.loads(resp.read())\n",
-    "cdp_ws_url = data[\"sessionUrl\"]\n",
-    "\n",
-    "print(f\"✅ Remote browser provisioned\")\n",
-    "print(f\"   CDP WebSocket: {cdp_ws_url[:80]}...\")\n",
-    "\n",
-    "# Step 2: Write to .env so browser-harness picks it up\n",
-    "env_path = os.path.join(os.path.dirname(os.path.abspath('.')), 'browser-harness', '.env')\n",
-    "# Or set it directly for this session:\n",
-    "os.environ[\"BU_CDP_WS\"] = cdp_ws_url\n",
-    "\n",
-    "print(\"\\n⚠️  IMPORTANT: Do NOT restart the daemon after this point.\")\n",
-    "print(\"   The remote browser is destroyed when the WebSocket closes.\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Verify connection\n",
-    "# NOTE: browser-harness uses -c flag for script execution\n",
-    "result = subprocess.run(\n",
-    "    [\"browser-harness\", \"-c\", \"print(page_info())\"],\n",
-    "    capture_output=True, text=True, timeout=30,\n",
-    "    env={**os.environ, \"BU_CDP_WS\": cdp_ws_url}\n",
-    ")\n",
-    "print(result.stdout)\n",
-    "if result.returncode == 0:\n",
-    "    print(\"✅ Browser-harness connected to PWW remote browser\")\n",
-    "else:\n",
-    "    print(f\"❌ Connection failed: {result.stderr}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Section 4: Parallel Web Scraping (10+ Sessions)\n",
-    "\n",
-    "We'll scrape product data from [books.toscrape.com](http://books.toscrape.com) — a public demo site designed for scraping practice.\n",
-    "\n",
-    "Each parallel session scrapes a different category page."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Define the pages to scrape (one per parallel browser session)\n",
-    "CATEGORY_URLS = [\n",
-    "    \"http://books.toscrape.com/catalogue/category/books/travel_2/index.html\",\n",
-    "    \"http://books.toscrape.com/catalogue/category/books/mystery_3/index.html\",\n",
-    "    \"http://books.toscrape.com/catalogue/category/books/historical-fiction_4/index.html\",\n",
-    "    \"http://books.toscrape.com/catalogue/category/books/sequential-art_5/index.html\",\n",
-    "    \"http://books.toscrape.com/catalogue/category/books/classics_6/index.html\",\n",
-    "    \"http://books.toscrape.com/catalogue/category/books/philosophy_7/index.html\",\n",
-    "    \"http://books.toscrape.com/catalogue/category/books/romance_8/index.html\",\n",
-    "    \"http://books.toscrape.com/catalogue/category/books/womens-fiction_9/index.html\",\n",
-    "    \"http://books.toscrape.com/catalogue/category/books/fiction_10/index.html\",\n",
-    "    \"http://books.toscrape.com/catalogue/category/books/childrens_11/index.html\",\n",
-    "    \"http://books.toscrape.com/catalogue/category/books/religion_12/index.html\",\n",
-    "    \"http://books.toscrape.com/catalogue/category/books/nonfiction_13/index.html\",\n",
-    "]\n",
-    "\n",
-    "print(f\"Will scrape {len(CATEGORY_URLS)} category pages in parallel\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Scraping script template for each parallel browser session\n",
-    "# browser-harness uses -c flag: browser-harness -c \"<script>\"\n",
-    "SCRAPE_SCRIPT = (\n",
-    "    'import json\\n'\n",
-    "    'new_tab(\"{url}\")\\n'\n",
-    "    'wait_for_load()\\n'\n",
-    "    'books = js(\"Array.from(document.querySelectorAll(\\'article.product_pod\\')).map(el => ({{'\n",
-    "    'title: el.querySelector(\\'h3 a\\').getAttribute(\\'title\\'),'\n",
-    "    'price: el.querySelector(\\'.price_color\\').textContent,'\n",
-    "    'availability: el.querySelector(\\'.availability\\').textContent.trim(),'\n",
-    "    'rating: el.querySelector(\\'p.star-rating\\').className.replace(\\'star-rating \\', \\'\\')'\n",
-    "    '}}))\"\\n)\\n'\n",
-    "    'category = js(\"document.querySelector(\\'.page-header h1\\').textContent\")\\n'\n",
-    "    'print(json.dumps({{\"category\": category, \"books\": books}}))'\n",
-    ")\n",
-    "\n",
-    "\n",
-    "def scrape_category(url, session_name):\n",
-    "    \"\"\"Scrape a single category page using a dedicated browser session.\n",
-    "    \n",
-    "    Each session gets its own remote browser via a distinct BU_NAME.\n",
-    "    The daemon must connect immediately after provisioning — the session\n",
-    "    URL is ephemeral and expires quickly.\n",
-    "    \"\"\"\n",
-    "    script = SCRAPE_SCRIPT.format(url=url)\n",
-    "    result = subprocess.run(\n",
-    "        [\"browser-harness\", \"-c\", script],\n",
-    "        capture_output=True, text=True, timeout=60,\n",
-    "        env={**os.environ, \"BU_NAME\": session_name, \"BU_CDP_WS\": cdp_ws_url}\n",
-    "    )\n",
-    "    if result.returncode == 0:\n",
-    "        # Parse the JSON output from the last print statement\n",
-    "        for line in result.stdout.strip().split('\\n'):\n",
-    "            try:\n",
-    "                return json.loads(line)\n",
-    "            except json.JSONDecodeError:\n",
-    "                continue\n",
-    "    return {\"error\": result.stderr, \"url\": url}\n",
-    "\n",
-    "\n",
-    "print(\"Ready to scrape. Running parallel sessions...\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Execute parallel scraping across all categories\n",
-    "all_results = []\n",
-    "\n",
-    "with ThreadPoolExecutor(max_workers=12) as executor:\n",
-    "    futures = {\n",
-    "        executor.submit(scrape_category, url, f\"scraper-{i}\"): url\n",
-    "        for i, url in enumerate(CATEGORY_URLS)\n",
-    "    }\n",
-    "    \n",
-    "    for future in as_completed(futures):\n",
-    "        url = futures[future]\n",
-    "        try:\n",
-    "            result = future.result()\n",
-    "            if \"error\" not in result:\n",
-    "                all_results.append(result)\n",
-    "                print(f\"✅ {result['category']}: {len(result['books'])} books\")\n",
-    "            else:\n",
-    "                print(f\"❌ {url}: {result['error'][:100]}\")\n",
-    "        except Exception as e:\n",
-    "            print(f\"❌ {url}: {e}\")\n",
-    "\n",
-    "print(f\"\\n📊 Scraped {sum(len(r['books']) for r in all_results)} books from {len(all_results)} categories\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Aggregate into a DataFrame\n",
-    "rows = []\n",
-    "for result in all_results:\n",
-    "    for book in result[\"books\"]:\n",
-    "        rows.append({\n",
-    "            \"category\": result[\"category\"],\n",
-    "            \"title\": book[\"title\"],\n",
-    "            \"price\": book[\"price\"],\n",
-    "            \"availability\": book[\"availability\"],\n",
-    "            \"rating\": book[\"rating\"],\n",
-    "        })\n",
-    "\n",
-    "df = pd.DataFrame(rows)\n",
-    "print(f\"Total books scraped: {len(df)}\")\n",
-    "df.head(15)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Section 5: LiveView for Debuggability\n",
-    "\n",
-    "LiveView lets you watch any browser session in real-time. The `LiveViewWatcher` polls for new sessions and auto-opens the viewer in your browser."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from helpers.live_view_watcher import LiveViewWatcher\n",
-    "\n",
-    "# Initialize the watcher\n",
-    "live_watcher = LiveViewWatcher(\n",
-    "    pw_client=pw_client,\n",
-    "    workspace_id=workspace_id,\n",
-    "    credential=credential,\n",
-    "    auth_token=playwright_api_key,\n",
-    "    auth_service_base=base_url,\n",
-    ")\n",
-    "\n",
-    "# Start watching — when the next browser session is created, LiveView opens automatically\n",
-    "live_watcher.start()\n",
-    "print(\"👀 LiveView watcher active — will open viewer when a new session starts\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Run a single scraping task to trigger LiveView\n",
-    "demo_result = scrape_category(CATEGORY_URLS[0], \"live-demo\")\n",
-    "print(f\"Scraped: {demo_result.get('category', 'unknown')}\")\n",
-    "\n",
-    "live_watcher.stop()\n",
-    "if live_watcher.session_id:\n",
-    "    print(f\"\\n✅ LiveView opened for session: {live_watcher.session_id}\")\n",
-    "else:\n",
-    "    print(\"\\nℹ️  No new session detected (session may have reused an existing one)\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Section 6: Cleanup\n",
-    "\n",
-    "Stop browser sessions and optionally delete the workspace."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# List active sessions\n",
-    "sessions = list(pw_client.browser_sessions.list(workspace_id))\n",
-    "print(f\"Active sessions: {len(sessions)}\")\n",
-    "for s in sessions:\n",
-    "    print(f\"  - {s.id} | {s.browser_type} | {s.status}\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Optional: Delete the workspace (uncomment to run)\n",
-    "# pw_mgmt.playwright_workspaces.begin_delete(\n",
-    "#     resource_group_name=RESOURCE_GROUP,\n",
-    "#     playwright_workspace_name=PLAYWRIGHT_WORKSPACE_NAME,\n",
-    "# ).result()\n",
-    "# print(\"✅ Workspace deleted\")\n",
-    "\n",
-    "print(\"Done! Your scraped data is in the 'df' DataFrame above.\")"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "name": "python",
-   "version": "3.10.0"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}
diff --git a/samples/browser-harness-webscraping/playwright_service_client.py b/samples/browser-harness-webscraping/playwright_service_client.py
new file mode 100644
index 0000000..d51c118
--- /dev/null
+++ b/samples/browser-harness-webscraping/playwright_service_client.py
@@ -0,0 +1,95 @@
+"""
+Microsoft Playwright Service - Python Client
+
+Get a Service URL to connect to get remote CDP browsers.
+
+----------------------------------------
+📌 Prerequisites
+----------------------------------------
+pip install python-dotenv
+
+----------------------------------------
+📌 Environment Variables
+----------------------------------------
+PLAYWRIGHT_SERVICE_URL=wss://<region>.api.playwright.microsoft.com/playwrightworkspaces/<workspaceId>/browsers
+PLAYWRIGHT_SERVICE_ACCESS_TOKEN=your_access_token
+
+----------------------------------------
+📌 How to Use
+----------------------------------------
+    from playwright_service_client import get_cdp_browsers_endpoint
+    
+    endpoint = get_cdp_browsers_endpoint()
+"""
+
+import re
+import os
+from dotenv import load_dotenv
+
+load_dotenv()
+
+
+class PlaywrightServiceError(Exception):
+    """Exception for Playwright Service errors."""
+    pass
+
+
+# URL pattern: wss://<region>.api.playwright.microsoft.com/playwrightworkspaces/<workspaceId>/browsers
+_URL_PATTERN = re.compile(
+    r'wss://(\w+)\.api\.playwright\.microsoft\.com/playwrightworkspaces/([^/]+)/browsers'
+)
+
+
+def _parse_url(url: str) -> tuple[str, str]:
+    """Extract region and workspace ID from service URL."""
+    match = _URL_PATTERN.match(url)
+    if not match:
+        raise PlaywrightServiceError(
+            f"Invalid PLAYWRIGHT_SERVICE_URL format: {url}\n"
+            f"Expected: wss://<region>.api.playwright.microsoft.com/playwrightworkspaces/<workspaceId>/browsers"
+        )
+    return match.group(1), match.group(2)
+
+
+def get_cdp_browsers_endpoint(
+    service_url: str | None = None,
+    access_token: str | None = None
+) -> str:
+    """
+    Get the SERVICE_URL that an agent can use to get browsers that it can connect to via CDP
+    """
+    Args:
+        service_url: Service URL (defaults to PLAYWRIGHT_SERVICE_URL env var)
+        access_token: Access token (defaults to PLAYWRIGHT_SERVICE_ACCESS_TOKEN env var)
+        
+    Returns:
+        WebSocket URL for CDP connection
+        
+    Example:
+        cdp_url = await get_cdp_endpoint()
+        browser = await playwright.chromium.connect_over_cdp(cdp_url)
+    """
+    # Get credentials from env vars if not provided
+    service_url = service_url or os.getenv("PLAYWRIGHT_SERVICE_URL")
+    access_token = access_token or os.getenv("PLAYWRIGHT_SERVICE_ACCESS_TOKEN")
+    
+    if not service_url:
+        raise PlaywrightServiceError(
+            "PLAYWRIGHT_SERVICE_URL environment variable is not set.\n"
+            "Expected: wss://<region>.api.playwright.microsoft.com/playwrightworkspaces/<workspaceId>/browsers"
+        )
+    if not access_token:
+        raise PlaywrightServiceError(
+            "PLAYWRIGHT_SERVICE_ACCESS_TOKEN environment variable is not set."
+        )
+    
+    # Parse URL to get region and workspace ID
+    region, workspace_id = _parse_url(service_url)
+    
+    # Build API URL
+    api_url = (
+        f"https://{region}.api.playwright.microsoft.com"
+        f"/playwrightworkspaces/{workspace_id}/browsers"
+        f"?os={os_name}&browser=chromium&playwrightVersion=cdp&shouldRedirect=false")
+
+    return api_url
\ No newline at end of file
diff --git a/samples/browser-harness-webscraping/requirements.txt b/samples/browser-harness-webscraping/requirements.txt
index f0257ae..566cccb 100644
--- a/samples/browser-harness-webscraping/requirements.txt
+++ b/samples/browser-harness-webscraping/requirements.txt
@@ -1,6 +1 @@
-azure-identity
-azure-mgmt-playwright
-azure-developer-playwright
-jupyter
-pandas
 python-dotenv

From b65df07c6b0fd4ba1edb0446d1e47d42d7c056d6 Mon Sep 17 00:00:00 2001
From: Mitesh Shah <58204159+mitsha-microsoft@users.noreply.github.com>
Date: Fri, 8 May 2026 12:29:14 +0000
Subject: [PATCH 3/4] Fix code

---
 .../playwright_service_client.py                            | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/samples/browser-harness-webscraping/playwright_service_client.py b/samples/browser-harness-webscraping/playwright_service_client.py
index d51c118..a4df568 100644
--- a/samples/browser-harness-webscraping/playwright_service_client.py
+++ b/samples/browser-harness-webscraping/playwright_service_client.py
@@ -57,17 +57,15 @@ def get_cdp_browsers_endpoint(
 ) -> str:
     """
     Get the SERVICE_URL that an agent can use to get browsers that it can connect to via CDP
-    """
     Args:
         service_url: Service URL (defaults to PLAYWRIGHT_SERVICE_URL env var)
         access_token: Access token (defaults to PLAYWRIGHT_SERVICE_ACCESS_TOKEN env var)
         
     Returns:
-        WebSocket URL for CDP connection
+        URL for getting CDP browsers
         
     Example:
-        cdp_url = await get_cdp_endpoint()
-        browser = await playwright.chromium.connect_over_cdp(cdp_url)
+        SERVICE_URL = await get_cdp_browsers_endpoint()
     """
     # Get credentials from env vars if not provided
     service_url = service_url or os.getenv("PLAYWRIGHT_SERVICE_URL")

From 4377d6cc500f27eda8a01eb1aa6416cf2c37b34c Mon Sep 17 00:00:00 2001
From: Nandini Muralidharan <nandinim+microsoft@microsoft.com>
Date: Mon, 11 May 2026 15:52:38 +0530
Subject: [PATCH 4/4] Update README prompt and fix os param in
 playwright_service_client

- Simplify the scraping prompt in README.md with a practical example
- Fix playwright_service_client.py: hardcode os=linux (remote browsers
  run Linux regardless of client OS) and remove undefined os_name variable

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 samples/browser-harness-webscraping/README.md | 45 +++----------------
 .../playwright_service_client.py              |  4 +-
 2 files changed, 7 insertions(+), 42 deletions(-)

diff --git a/samples/browser-harness-webscraping/README.md b/samples/browser-harness-webscraping/README.md
index 8818c7d..983dd07 100644
--- a/samples/browser-harness-webscraping/README.md
+++ b/samples/browser-harness-webscraping/README.md
@@ -71,47 +71,12 @@ Once connected, confirm with a screenshot that the remote browser is alive.
 Once this done, you can ask your agent to use browser-harness with playwright remote browsers to perform web scraping. Use a prompt similar to something like this:
 
 ```text
-Use browser-harness to scrape this website with 10 parallel remote browsers from Playwright Service.
- 
- Target:
- - Website: <URL>
- - Information to extract: <INFORMATION HERE>
- - Output format: <TABLE / JSON / CSV>
- - Save artifacts to: <FOLDER>
- 
- Remote browser requirements:
- 1. Create 10 independent Playwright Service remote browser sessions.
- 2. Each sub-agent must use a unique BU_NAME, for example:
-    scrape-worker-01 ... scrape-worker-10
- 3. Each sub-agent must resolve its own sessionUrl using this two-step flow:
-    - HTTP GET SERVICE_URL with shouldRedirect=false and a 120s timeout.
-    - Parse JSON response.sessionUrl.
-    - Set BU_CDP_WS to that sessionUrl only in the same process environment as browser-harness.
- 4. Do not write SERVICE_URL, access keys, or resolved WebSocket URLs to disk.
- 5. Do not use .env for BU_CDP_WS.
- 7. Close each remote session after scraping.
- 
- Get the SERVICE_URL by running `get_cdp_browsers_endpoint()` method from `playwright_service_client.py`
- 
- Work splitting:
- - Decompose the scrape into 10 independent chunks.
- - Dispatch all 10 workers in parallel.
- - Each worker should scrape only its assigned chunk.
- - After all workers complete, merge and deduplicate results.
- - Validate the final output against the requested count/schema.
- 
- For each worker, require this final response:
- - Assigned chunk
- - Remote routing proof: BU_NAME used and confirmation that BU_CDP_WS pointed to Playwright Service
- - Items scraped
- - Screenshot path
- - Whether the remote session was closed
- - Any blockers
+
+Go to ecommerce websites Website 1, Website 2,  in Geography India  search for gifts under 500 for 10 year old kids which is useful, reusable and not single time use.
+Delivery in Bengaluru should be within 3 days.It should be such that 5 pieces of the item are available. 
+Create independent Playwright Service remote browser sessions per
+website and use one sub-agent per website to browse in parallel  using  browser harness. Clone each remote session after scraping.
  
- Final response:
- - Return the merged scraped data.
- - Mention any chunks that failed or were partial.
- - Confirm all remote sessions were closed.
 ```
 
 ## More Resources
diff --git a/samples/browser-harness-webscraping/playwright_service_client.py b/samples/browser-harness-webscraping/playwright_service_client.py
index a4df568..676df35 100644
--- a/samples/browser-harness-webscraping/playwright_service_client.py
+++ b/samples/browser-harness-webscraping/playwright_service_client.py
@@ -88,6 +88,6 @@ def get_cdp_browsers_endpoint(
     api_url = (
         f"https://{region}.api.playwright.microsoft.com"
         f"/playwrightworkspaces/{workspace_id}/browsers"
-        f"?os={os_name}&browser=chromium&playwrightVersion=cdp&shouldRedirect=false")
+        f"?os=linux&browser=chromium&playwrightVersion=cdp&shouldRedirect=false")
 
-    return api_url
\ No newline at end of file
+    return api_url