features added on agent and some fixes on web live tool by GopiB9119 · Pull Request #8 · GopiB9119/agent-live-web

GopiB9119 · 2026-04-11T02:44:55Z

Copilot

Pull request overview

This PR appears to be a v6.0 release pass that expands the agent’s capabilities (new Python tool managers + background task system), hardens security/secret hygiene, and formalizes governance/docs/testing around the live-web (Playwright Edge MCP) workflow.

Changes:

Added multiple new Python “manager” tool modules (git, test generation, snapshots/rollback, refactoring, vision, docs, background tasks) and wired them into agent/agent/tools.py + tool registry/schemas.
Strengthened redaction/sanitization across web/oauth/mcp/tool-call paths and added broader JS+Python test coverage plus CI workflows (including nightly integration).
Added scripts and docs for validation/preflight, reorganized quickstart/docs, and removed legacy unsafe JS entry points.

Reviewed changes

Copilot reviewed 83 out of 86 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
USAGE_QUICKSTART.md	Removed root-level quickstart (moved under `docs/`).
docs/USAGE_QUICKSTART.md	New consolidated quickstart for Node/VS Code/Python runtimes.
README.md	Expanded product/workflow description, updated security/testing/run instructions, and pointed to docs.
CHANGELOG.md	Added changelog with v6.0.0 notes and prior releases.
SECURITY.md	Updated supported versions table.
package.json	Bumped to 6.0.0, added daemon/test/validation/preflight scripts, and updated published `files` metadata.
index.js	Added npm entry point exporting core JS APIs (EdgeSession/parser/tracing).
nl-command-parser.js	Added NL support for back/forward/refresh/press/hover/select/focus/clear/double/right click.
cli-agent.js	Switched to `runAction` and added sanitized, summary-first output; documented new NL commands.
tests/smoke-chain.test.js	Added end-to-end-ish JS smoke tests using mocked Playwright page/locators.
scripts/mcp-preflight.js	Added MCP/EdgeSession preflight script (owner/safety/redaction + minimal browser checks).
scripts/validate-web-task.js	Added CLI validator for web-task JSON files (manual checks + warnings/errors).
scripts/validate-governance.js	Added validator for `.github` governance structure and frontmatter expectations.
scripts/resolve-python.js	Added node wrapper to locate `.venv` python or fall back to system python for npm scripts.
Dockerfile	Added container build for Node+Python agent runtime and Playwright Edge install attempt.
.dockerignore	Added docker ignore rules to keep image small and avoid leaking artifacts/secrets.
.npmignore	Added npm publish ignore rules to ship only necessary JS artifacts.
.env.example	Added documented environment template for Node/Python/MCP/tracing configuration.
.gitignore	Simplified PlaywrightMCP ignores, added eval outputs and snapshot dir ignore patterns.
agent-logic.js	Removed legacy direct Playwright helper entry point.
playwright-edge-agent.js	Removed legacy direct Playwright helper entry point.
docs/PRODUCT_ARCHITECTURE.md	Added unified architecture doc (two runtimes, one skill/workflow contract).
docs/TRUST_RELIABILITY_EXECUTION_PLAN.md	Added phased reliability/trust execution plan for browser automation.
docs/RELEASE_CHECKLIST.md	Updated release checklist to include full test suite, governance validation, and preflight.
docs/MEMORY_USAGE.md	Added memory tools usage guide with example payloads and result shapes.
docs/security_best_practices_report.md	Added security audit-style report describing hardening work and residual risks.
docs/generate_api_docs.py	Added script to generate API reference from tool schemas.
agent/agent/config.py	Added extracted config module for environment + model client creation + runtime constants.
agent/agent/SYSTEM_PROMPT.md	Expanded tool list and added grounding/continuity rules.
agent/agent/tools.py	Wired in new managers/tools, improved calculator security, and added sanitized `call_tool` behavior.
agent/agent/tooling/registry.py	Registered new tool names for auto-registration.
agent/agent/tooling/schemas.py	Updated schema docs (token hygiene messaging) + added `include_artifact` flags.
agent/agent/web_tools.py	Updated web_fetch to use centralized redaction in JSON responses and tests.
agent/agent/oauth_tools.py	Updated OAuth tools to use redacted JSON responses and block raw token output.
agent/agent/mcp_tools.py	Added artifact capture utilities + improved sanitization in tool responses; env-based trace/session toggles.
agent/agent/vision_tools.py	Added vision utilities (encode/compare/describe page) with redacted JSON responses.
agent/agent/snapshot_tools.py	Added snapshot/rollback manager for file changes.
agent/agent/refactor_tools.py	Added refactor utilities (rename/dead code/duplicates/metrics).
agent/agent/doc_tools.py	Added documentation generation/coverage/changelog helper tools.
agent/agent/background_tools.py	Added foreground tools for submitting/monitoring background tasks.
agent/agent/background/init.py	Added background daemon architecture overview module doc.
agent/agent/background/task_queue.py	Added file-based task queue with locking, dependencies, retention, and progress streaming.
agent/agent/background/worker.py	Added background worker executing the same agent toolchain loop for queued tasks.
agent/agent/ARCHITECTURE_PLAN.md	Updated architecture plan status and next steps.
agent/init.py	Marked `agent` as a Python package.
agent/agent/init.py	Marked `agent/agent` as a Python package.
agent/agent/tests/init.py	Added test package marker.
agent/agent/tests/integration/init.py	Added integration test package marker.
agent/agent/tests/test_tools.py	Added tests for sanitized `call_tool` behavior + calculator AST safety.
agent/agent/tests/test_oauth_tools.py	Added tests ensuring secrets/tokens are not exposed in OAuth flows.
agent/agent/tests/test_fs_tools.py	Added tests ensuring FS outputs redact secrets.
agent/agent/tests/test_config.py	Added tests for config exports/ranges and client/model factory behavior.
agent/agent/tests/test_command_tools.py	Added tests for restricted/permissive command execution policy and redaction.
agent/agent/tests/test_workflow_tools.py	Added tests for artifacts/summaries and nested-tool-result sanitization + recursion blocking.
agent/agent/tests/test_web_tools.py	Updated web_fetch test to validate redaction in body/text/url.
agent/agent/tests/test_mcp_tools.py	Added test for captured debug artifacts sanitization and output listing.
agent/agent/tests/integration/test_mcp_live_integration.py	Enhanced live MCP integration tests with env overrides and persisted failure artifacts.
agent/agent/tests/integration/README.md	Documented failure artifact behavior and opt-in flags.
.github/README.md	Rewrote governance guide into “lanes” and clarified where different rules belong.
.github/copilot-instructions.md	Linked lane-specific rules and reduced duplication.
.github/instructions/playwright-edge.instructions.md	Expanded runtime governance structure and validation expectations.
.github/instructions/live-web-governance.instructions.md	Added governance-authoring structure requirements for live-web lane files.
.github/prompts/playwright-live-web-task-brief.prompt.md	Added structured intake prompt for live-web tasks.
.github/skills/web-works/SKILL.md	Expanded skill scope/intake/verification/failure policy and improved wording.
.github/skills/web-works/PROMPTS.md	Expanded prompt pack guidance and added structured brief section.
.github/skills/web-works/web-task.schema.json	Extended allowed step actions (fill/hover/search/snapshot/nav shortcuts/etc.).
.github/skills/web-works/examples/extract-github-trending.json	Added example structured extract task.
.github/skills/web-works/examples/automate-wikipedia-search.json	Added example structured automation task.
.github/workflows/python-agent-tests.yml	Updated workflow to also run JS check + JS unit tests.
.github/workflows/nightly-integration.yml	Added nightly job running full suite + preflight + opt-in live integration + artifact upload.
.github/workflows/live-web-governance-check.yml	Added governance check workflow that runs syntax + governance validation on relevant paths.
.github/CODEOWNERS	Simplified ownership requirements to a single owner for sensitive paths.
.devcontainer/devcontainer.json	Added devcontainer config for Node+Python + relevant extensions and forwarded ports.

GopiB9119 · 2026-04-18T14:59:48Z

@copilot apply changes based on the comments in this thread

Agent-Logs-Url: https://github.com/GopiB9119/agent-live-web/sessions/e75c7ab0-2a3a-4952-98f3-a7585c5c90c9 Co-authored-by: GopiB9119 <93776715+GopiB9119@users.noreply.github.com>

Copilot · 2026-04-18T15:06:40Z

@copilot apply changes based on the comments in this thread

Implemented the review-thread fixes across the referenced files in commits 86c3b56 and 118c1c9 (preflight offline behavior, task validator fixes, safer Python process execution, doc/type/docs consistency updates, and Docker Edge install strictness controls).

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

example.com
- Triggering command: /usr/bin/python3 python3 -m unittest discover -s agent/agent/tests -p test_*.py -v (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

devin-ai-integration

Devin Review found 2 potential issues.

View 8 additional findings in Devin Review.

GopiB9119

.

Copilot

Pull request overview

Copilot reviewed 83 out of 86 changed files in this pull request and generated 4 comments.

Copilot · 2026-04-18T17:43:58Z

+const STEP_ACTIONS = ['navigate', 'click', 'type', 'fill', 'press', 'select', 'hover', 'search', 'wait', 'check', 'extract', 'scroll', 'download', 'upload', 'screenshot', 'snapshot', 'back', 'forward', 'refresh', 'delete', 'focus', 'clear', 'doubleClick', 'rightClick', 'evaluate'];
+const TARGET_OPTIONAL_ACTIONS = new Set(['wait', 'screenshot', 'snapshot', 'back', 'forward', 'refresh']);
+
+function normalizeAllowedDomain(domain) {
+  if (typeof domain !== 'string') return '';
+  const trimmed = domain.trim().toLowerCase();
+  if (!trimmed) return '';
+  const withoutWildcard = trimmed.startsWith('*.') ? trimmed.slice(2) : trimmed;
+  return withoutWildcard.split(':')[0];
+}
+
+function validate(taskPath) {
+  const errors = [];
+  const warnings = [];
+
+  if (!fs.existsSync(taskPath)) {
+    errors.push(`File not found: ${taskPath}`);
+    return { errors, warnings };
+  }
+
+  let data;
+  try {
+    data = JSON.parse(fs.readFileSync(taskPath, 'utf8'));
+  } catch (e) {
+    errors.push(`Invalid JSON: ${e.message}`);
+    return { errors, warnings };
+  }
+
+  // Required fields
+  for (const field of REQUIRED_FIELDS) {
+    if (!(field in data)) {
+      errors.push(`Missing required field: ${field}`);
+    }
+  }
+
+  // Version
+  if (data.version && !VALID_VERSIONS.includes(data.version)) {
+    errors.push(`Invalid version "${data.version}". Must be one of: ${VALID_VERSIONS.join(', ')}`);
+  }
+
+  // Mode
+  if (data.mode && !VALID_MODES.includes(data.mode)) {
+    errors.push(`Invalid mode "${data.mode}". Must be one of: ${VALID_MODES.join(', ')}`);
+  }
+
+  // Execution profile
+  if (data.execution_profile && !VALID_PROFILES.includes(data.execution_profile)) {
+    errors.push(`Invalid execution_profile "${data.execution_profile}". Must be one of: ${VALID_PROFILES.join(', ')}`);
+  }
+
+  // task_id
+  if (data.task_id && typeof data.task_id === 'string' && !data.task_id.trim()) {
+    errors.push('task_id must not be empty');
+  }
+
+  // start_url
+  if (data.start_url) {
+    try {
+      new URL(data.start_url);
+    } catch (_) {
+      errors.push(`start_url is not a valid URL: "${data.start_url}"`);
+    }
+  }
+
+  // objective
+  if (data.objective && typeof data.objective === 'string' && data.objective.length < 5) {
+    warnings.push('objective is very short — consider being more specific');
+  }
+
+  // success_criteria
+  if (data.success_criteria) {
+    if (!Array.isArray(data.success_criteria)) {
+      errors.push('success_criteria must be an array');
+    } else if (data.success_criteria.length === 0) {
+      errors.push('success_criteria must have at least one entry');
+    }
+  }
+
+  // steps
+  if (data.steps) {
+    if (!Array.isArray(data.steps)) {
+      errors.push('steps must be an array');
+    } else {
+      const stepIds = new Set();
+      for (let i = 0; i < data.steps.length; i++) {
+        const step = data.steps[i];
+        const prefix = `steps[${i}]`;
+
+        if (typeof step !== 'object' || step === null) {
+          errors.push(`${prefix}: must be an object`);
+          continue;
+        }
+
+        if (!step.id) {
+          errors.push(`${prefix}: missing required field "id"`);
+        } else if (stepIds.has(step.id)) {
+          errors.push(`${prefix}: duplicate step id "${step.id}"`);
+        } else {
+          stepIds.add(step.id);
+        }
+
+        if (!step.action) {
+          errors.push(`${prefix}: missing required field "action"`);
+        } else if (!STEP_ACTIONS.includes(step.action)) {
+          warnings.push(`${prefix}: action "${step.action}" is not a standard action (${STEP_ACTIONS.join(', ')})`);
+        }


STEP_ACTIONS includes evaluate, but the web-task schema (.github/skills/web-works/web-task.schema.json) does not allow that action. Also, when a step action is not in STEP_ACTIONS, this validator only emits a warning; that can result in "valid with warnings" even though the task would fail schema validation. Align STEP_ACTIONS with the schema and treat unknown step.action values as an error (or update the schema if custom actions are intended).

Copilot · 2026-04-18T17:43:58Z

+      // Navigation test (use about:blank → simple page)
+      await session.page.goto('data:text/html,<h1>MCP Preflight</h1>');
+      const pageTitle = await session.page.title();
+      const pageUrl = session.page.url();
+      const navOk = pageUrl.startsWith('data:') || pageUrl !== 'about:blank';
+      report('Navigation', navOk, `url=${pageUrl.slice(0, 60)}`);
+
+      // Grounded summary check — use runAction which produces the summary
+      const gotoResult = await session.runAction('goto', { url: 'data:text/html,<p>grounded-summary-check</p>' });
+      const hasSummary = typeof gotoResult.summary === 'string' && gotoResult.summary.length > 0;
+      report('Grounded summary', hasSummary, gotoResult.summary ? gotoResult.summary.slice(0, 80) : 'missing');


The preflight uses session.runAction('goto', { url: 'data:text/html,...' }), but EdgeSession.handleGoto only recognizes http/https and will prefix other schemes with https://, making this navigation fail. Since the check only asserts summary exists (not gotoResult.status === 'ok'), the preflight can report success while the core goto verification is failing. Use an http(s) URL that EdgeSession supports (or update EdgeSession to allow data:/file:/about:), and assert gotoResult.status/gotoResult.verification.ok in the preflight.

Copilot · 2026-04-18T17:43:59Z

+        before_size = before.stat().st_size
+        after_size = after.stat().st_size
+        size_diff = after_size - before_size
+        size_changed = abs(size_diff) > 1024
+
+        # Byte-level comparison
+        before_bytes = before.read_bytes()
+        after_bytes = after.read_bytes()
+        identical = before_bytes == after_bytes
+
+        return self._json_response({
+            "status": "ok",
+            "identical": identical,
+            "before": {"path": before_path, "size": before_size},
+            "after": {"path": after_path, "size": after_size},
+            "size_diff_bytes": size_diff,
+            "visually_changed": not identical,
+            "significant_size_change": size_changed,
+            "before_base64": self._encode_image_to_base64(before) if not identical else None,
+            "after_base64": self._encode_image_to_base64(after) if not identical else None,
+        }, max_chars=8 * 1024 * 1024)


vision_compare reads both files fully into memory and, when different, base64-encodes and returns both images in the JSON response. Without any size cap/option to omit base64 payloads, this can create very large responses (and memory spikes) for typical screenshots, making the tool unreliable in practice. Add a max-bytes guard (similar to vision_encode) and/or an include_base64 flag defaulting to false with a small preview/hash-based diff by default.

Copilot · 2026-04-18T17:43:59Z

+  const locator = {
+    count: mock.fn(async () => options.locatorCount || 1),
+    first: mock.fn(() => locator),


In the mock locator, count uses options.locatorCount || 1, which makes it impossible to simulate a real 0 match count (0 becomes 1). Using nullish coalescing (??) would allow tests to cover the zero-match path accurately.

features added on agent and some fixes on web live tool

12b094d

Copilot AI review requested due to automatic review settings April 11, 2026 02:44

Copilot started reviewing on behalf of GopiB9119 April 11, 2026 02:45 View session

GopiB9119 enabled auto-merge (squash) April 11, 2026 02:45

Copilot AI reviewed Apr 11, 2026

View reviewed changes

Copilot started work on behalf of GopiB9119 April 18, 2026 14:59 View session

fix: address PR review thread feedback across scripts docs and docker

86c3b56

Agent-Logs-Url: https://github.com/GopiB9119/agent-live-web/sessions/e75c7ab0-2a3a-4952-98f3-a7585c5c90c9 Co-authored-by: GopiB9119 <93776715+GopiB9119@users.noreply.github.com>

auto-merge was automatically disabled April 18, 2026 15:02
Head branch was pushed to by a user without write access

GopiB9119 enabled auto-merge (squash) April 18, 2026 15:03

fix: refine validator domain matching and python signal handling

118c1c9

Agent-Logs-Url: https://github.com/GopiB9119/agent-live-web/sessions/e75c7ab0-2a3a-4952-98f3-a7585c5c90c9 Co-authored-by: GopiB9119 <93776715+GopiB9119@users.noreply.github.com>

auto-merge was automatically disabled April 18, 2026 15:06
Head branch was pushed to by a user without write access

Copilot finished work on behalf of GopiB9119 April 18, 2026 15:07

devin-ai-integration Bot reviewed Apr 18, 2026

View reviewed changes

Comment thread edge-session.js

Comment thread edge-session.js

GopiB9119 enabled auto-merge (squash) April 18, 2026 15:17

GopiB9119 disabled auto-merge April 18, 2026 17:38

GopiB9119 enabled auto-merge April 18, 2026 17:38

GopiB9119 requested a review from Copilot April 18, 2026 17:38

Copilot started reviewing on behalf of GopiB9119 April 18, 2026 17:39 View session

GopiB9119 commented Apr 18, 2026

View reviewed changes

Copilot AI reviewed Apr 18, 2026

View reviewed changes

Conversation

GopiB9119 commented Apr 11, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GopiB9119 commented Apr 18, 2026

Uh oh!

Copilot AI commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

GopiB9119 left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GopiB9119 commented Apr 11, 2026 •

edited by devin-ai-integration Bot

Loading

Copilot AI commented Apr 18, 2026 •

edited

Loading