Skip to content

features added on agent and some fixes on web live tool#8

Open
GopiB9119 wants to merge 3 commits into
mainfrom
feat/v6.0.0
Open

features added on agent and some fixes on web live tool#8
GopiB9119 wants to merge 3 commits into
mainfrom
feat/v6.0.0

Conversation

@GopiB9119

@GopiB9119 GopiB9119 commented Apr 11, 2026

Copy link
Copy Markdown
Owner

Copilot AI review requested due to automatic review settings April 11, 2026 02:44
@GopiB9119 GopiB9119 enabled auto-merge (squash) April 11, 2026 02:45

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR appears to be a v6.0 release pass that expands the agent’s capabilities (new Python tool managers + background task system), hardens security/secret hygiene, and formalizes governance/docs/testing around the live-web (Playwright Edge MCP) workflow.

Changes:

  • Added multiple new Python “manager” tool modules (git, test generation, snapshots/rollback, refactoring, vision, docs, background tasks) and wired them into agent/agent/tools.py + tool registry/schemas.
  • Strengthened redaction/sanitization across web/oauth/mcp/tool-call paths and added broader JS+Python test coverage plus CI workflows (including nightly integration).
  • Added scripts and docs for validation/preflight, reorganized quickstart/docs, and removed legacy unsafe JS entry points.

Reviewed changes

Copilot reviewed 83 out of 86 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
USAGE_QUICKSTART.md Removed root-level quickstart (moved under docs/).
docs/USAGE_QUICKSTART.md New consolidated quickstart for Node/VS Code/Python runtimes.
README.md Expanded product/workflow description, updated security/testing/run instructions, and pointed to docs.
CHANGELOG.md Added changelog with v6.0.0 notes and prior releases.
SECURITY.md Updated supported versions table.
package.json Bumped to 6.0.0, added daemon/test/validation/preflight scripts, and updated published files metadata.
index.js Added npm entry point exporting core JS APIs (EdgeSession/parser/tracing).
nl-command-parser.js Added NL support for back/forward/refresh/press/hover/select/focus/clear/double/right click.
cli-agent.js Switched to runAction and added sanitized, summary-first output; documented new NL commands.
tests/smoke-chain.test.js Added end-to-end-ish JS smoke tests using mocked Playwright page/locators.
scripts/mcp-preflight.js Added MCP/EdgeSession preflight script (owner/safety/redaction + minimal browser checks).
scripts/validate-web-task.js Added CLI validator for web-task JSON files (manual checks + warnings/errors).
scripts/validate-governance.js Added validator for .github governance structure and frontmatter expectations.
scripts/resolve-python.js Added node wrapper to locate .venv python or fall back to system python for npm scripts.
Dockerfile Added container build for Node+Python agent runtime and Playwright Edge install attempt.
.dockerignore Added docker ignore rules to keep image small and avoid leaking artifacts/secrets.
.npmignore Added npm publish ignore rules to ship only necessary JS artifacts.
.env.example Added documented environment template for Node/Python/MCP/tracing configuration.
.gitignore Simplified PlaywrightMCP ignores, added eval outputs and snapshot dir ignore patterns.
agent-logic.js Removed legacy direct Playwright helper entry point.
playwright-edge-agent.js Removed legacy direct Playwright helper entry point.
docs/PRODUCT_ARCHITECTURE.md Added unified architecture doc (two runtimes, one skill/workflow contract).
docs/TRUST_RELIABILITY_EXECUTION_PLAN.md Added phased reliability/trust execution plan for browser automation.
docs/RELEASE_CHECKLIST.md Updated release checklist to include full test suite, governance validation, and preflight.
docs/MEMORY_USAGE.md Added memory tools usage guide with example payloads and result shapes.
docs/security_best_practices_report.md Added security audit-style report describing hardening work and residual risks.
docs/generate_api_docs.py Added script to generate API reference from tool schemas.
agent/agent/config.py Added extracted config module for environment + model client creation + runtime constants.
agent/agent/SYSTEM_PROMPT.md Expanded tool list and added grounding/continuity rules.
agent/agent/tools.py Wired in new managers/tools, improved calculator security, and added sanitized call_tool behavior.
agent/agent/tooling/registry.py Registered new tool names for auto-registration.
agent/agent/tooling/schemas.py Updated schema docs (token hygiene messaging) + added include_artifact flags.
agent/agent/web_tools.py Updated web_fetch to use centralized redaction in JSON responses and tests.
agent/agent/oauth_tools.py Updated OAuth tools to use redacted JSON responses and block raw token output.
agent/agent/mcp_tools.py Added artifact capture utilities + improved sanitization in tool responses; env-based trace/session toggles.
agent/agent/vision_tools.py Added vision utilities (encode/compare/describe page) with redacted JSON responses.
agent/agent/snapshot_tools.py Added snapshot/rollback manager for file changes.
agent/agent/refactor_tools.py Added refactor utilities (rename/dead code/duplicates/metrics).
agent/agent/doc_tools.py Added documentation generation/coverage/changelog helper tools.
agent/agent/background_tools.py Added foreground tools for submitting/monitoring background tasks.
agent/agent/background/init.py Added background daemon architecture overview module doc.
agent/agent/background/task_queue.py Added file-based task queue with locking, dependencies, retention, and progress streaming.
agent/agent/background/worker.py Added background worker executing the same agent toolchain loop for queued tasks.
agent/agent/ARCHITECTURE_PLAN.md Updated architecture plan status and next steps.
agent/init.py Marked agent as a Python package.
agent/agent/init.py Marked agent/agent as a Python package.
agent/agent/tests/init.py Added test package marker.
agent/agent/tests/integration/init.py Added integration test package marker.
agent/agent/tests/test_tools.py Added tests for sanitized call_tool behavior + calculator AST safety.
agent/agent/tests/test_oauth_tools.py Added tests ensuring secrets/tokens are not exposed in OAuth flows.
agent/agent/tests/test_fs_tools.py Added tests ensuring FS outputs redact secrets.
agent/agent/tests/test_config.py Added tests for config exports/ranges and client/model factory behavior.
agent/agent/tests/test_command_tools.py Added tests for restricted/permissive command execution policy and redaction.
agent/agent/tests/test_workflow_tools.py Added tests for artifacts/summaries and nested-tool-result sanitization + recursion blocking.
agent/agent/tests/test_web_tools.py Updated web_fetch test to validate redaction in body/text/url.
agent/agent/tests/test_mcp_tools.py Added test for captured debug artifacts sanitization and output listing.
agent/agent/tests/integration/test_mcp_live_integration.py Enhanced live MCP integration tests with env overrides and persisted failure artifacts.
agent/agent/tests/integration/README.md Documented failure artifact behavior and opt-in flags.
.github/README.md Rewrote governance guide into “lanes” and clarified where different rules belong.
.github/copilot-instructions.md Linked lane-specific rules and reduced duplication.
.github/instructions/playwright-edge.instructions.md Expanded runtime governance structure and validation expectations.
.github/instructions/live-web-governance.instructions.md Added governance-authoring structure requirements for live-web lane files.
.github/prompts/playwright-live-web-task-brief.prompt.md Added structured intake prompt for live-web tasks.
.github/skills/web-works/SKILL.md Expanded skill scope/intake/verification/failure policy and improved wording.
.github/skills/web-works/PROMPTS.md Expanded prompt pack guidance and added structured brief section.
.github/skills/web-works/web-task.schema.json Extended allowed step actions (fill/hover/search/snapshot/nav shortcuts/etc.).
.github/skills/web-works/examples/extract-github-trending.json Added example structured extract task.
.github/skills/web-works/examples/automate-wikipedia-search.json Added example structured automation task.
.github/workflows/python-agent-tests.yml Updated workflow to also run JS check + JS unit tests.
.github/workflows/nightly-integration.yml Added nightly job running full suite + preflight + opt-in live integration + artifact upload.
.github/workflows/live-web-governance-check.yml Added governance check workflow that runs syntax + governance validation on relevant paths.
.github/CODEOWNERS Simplified ownership requirements to a single owner for sensitive paths.
.devcontainer/devcontainer.json Added devcontainer config for Node+Python + relevant extensions and forwarded ports.

Comment thread scripts/mcp-preflight.js
Comment thread scripts/validate-web-task.js Outdated
Comment thread scripts/validate-web-task.js Outdated
Comment thread scripts/validate-web-task.js Outdated
Comment thread package.json
Comment thread scripts/resolve-python.js
Comment thread agent/agent/doc_tools.py
Comment thread docs/USAGE_QUICKSTART.md
Comment thread README.md
Comment thread Dockerfile Outdated
@GopiB9119

Copy link
Copy Markdown
Owner Author

@copilot apply changes based on the comments in this thread

auto-merge was automatically disabled April 18, 2026 15:02

Head branch was pushed to by a user without write access

@GopiB9119 GopiB9119 enabled auto-merge (squash) April 18, 2026 15:03
auto-merge was automatically disabled April 18, 2026 15:06

Head branch was pushed to by a user without write access

Copilot AI commented Apr 18, 2026

Copy link
Copy Markdown

@copilot apply changes based on the comments in this thread

Implemented the review-thread fixes across the referenced files in commits 86c3b56 and 118c1c9 (preflight offline behavior, task validator fixes, safer Python process execution, doc/type/docs consistency updates, and Docker Edge install strictness controls).

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • example.com
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s agent/agent/tests -p test_*.py -v (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 8 additional findings in Devin Review.

Open in Devin Review

Comment thread edge-session.js
Comment thread edge-session.js
@GopiB9119 GopiB9119 enabled auto-merge (squash) April 18, 2026 15:17
@GopiB9119 GopiB9119 disabled auto-merge April 18, 2026 17:38
@GopiB9119 GopiB9119 enabled auto-merge April 18, 2026 17:38
@GopiB9119 GopiB9119 requested a review from Copilot April 18, 2026 17:38

@GopiB9119 GopiB9119 left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 83 out of 86 changed files in this pull request and generated 4 comments.

Comment on lines +13 to +118
const STEP_ACTIONS = ['navigate', 'click', 'type', 'fill', 'press', 'select', 'hover', 'search', 'wait', 'check', 'extract', 'scroll', 'download', 'upload', 'screenshot', 'snapshot', 'back', 'forward', 'refresh', 'delete', 'focus', 'clear', 'doubleClick', 'rightClick', 'evaluate'];
const TARGET_OPTIONAL_ACTIONS = new Set(['wait', 'screenshot', 'snapshot', 'back', 'forward', 'refresh']);

function normalizeAllowedDomain(domain) {
if (typeof domain !== 'string') return '';
const trimmed = domain.trim().toLowerCase();
if (!trimmed) return '';
const withoutWildcard = trimmed.startsWith('*.') ? trimmed.slice(2) : trimmed;
return withoutWildcard.split(':')[0];
}

function validate(taskPath) {
const errors = [];
const warnings = [];

if (!fs.existsSync(taskPath)) {
errors.push(`File not found: ${taskPath}`);
return { errors, warnings };
}

let data;
try {
data = JSON.parse(fs.readFileSync(taskPath, 'utf8'));
} catch (e) {
errors.push(`Invalid JSON: ${e.message}`);
return { errors, warnings };
}

// Required fields
for (const field of REQUIRED_FIELDS) {
if (!(field in data)) {
errors.push(`Missing required field: ${field}`);
}
}

// Version
if (data.version && !VALID_VERSIONS.includes(data.version)) {
errors.push(`Invalid version "${data.version}". Must be one of: ${VALID_VERSIONS.join(', ')}`);
}

// Mode
if (data.mode && !VALID_MODES.includes(data.mode)) {
errors.push(`Invalid mode "${data.mode}". Must be one of: ${VALID_MODES.join(', ')}`);
}

// Execution profile
if (data.execution_profile && !VALID_PROFILES.includes(data.execution_profile)) {
errors.push(`Invalid execution_profile "${data.execution_profile}". Must be one of: ${VALID_PROFILES.join(', ')}`);
}

// task_id
if (data.task_id && typeof data.task_id === 'string' && !data.task_id.trim()) {
errors.push('task_id must not be empty');
}

// start_url
if (data.start_url) {
try {
new URL(data.start_url);
} catch (_) {
errors.push(`start_url is not a valid URL: "${data.start_url}"`);
}
}

// objective
if (data.objective && typeof data.objective === 'string' && data.objective.length < 5) {
warnings.push('objective is very short — consider being more specific');
}

// success_criteria
if (data.success_criteria) {
if (!Array.isArray(data.success_criteria)) {
errors.push('success_criteria must be an array');
} else if (data.success_criteria.length === 0) {
errors.push('success_criteria must have at least one entry');
}
}

// steps
if (data.steps) {
if (!Array.isArray(data.steps)) {
errors.push('steps must be an array');
} else {
const stepIds = new Set();
for (let i = 0; i < data.steps.length; i++) {
const step = data.steps[i];
const prefix = `steps[${i}]`;

if (typeof step !== 'object' || step === null) {
errors.push(`${prefix}: must be an object`);
continue;
}

if (!step.id) {
errors.push(`${prefix}: missing required field "id"`);
} else if (stepIds.has(step.id)) {
errors.push(`${prefix}: duplicate step id "${step.id}"`);
} else {
stepIds.add(step.id);
}

if (!step.action) {
errors.push(`${prefix}: missing required field "action"`);
} else if (!STEP_ACTIONS.includes(step.action)) {
warnings.push(`${prefix}: action "${step.action}" is not a standard action (${STEP_ACTIONS.join(', ')})`);
}

Copilot AI Apr 18, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STEP_ACTIONS includes evaluate, but the web-task schema (.github/skills/web-works/web-task.schema.json) does not allow that action. Also, when a step action is not in STEP_ACTIONS, this validator only emits a warning; that can result in "valid with warnings" even though the task would fail schema validation. Align STEP_ACTIONS with the schema and treat unknown step.action values as an error (or update the schema if custom actions are intended).

Copilot uses AI. Check for mistakes.
Comment thread scripts/mcp-preflight.js
Comment on lines +63 to +73
// Navigation test (use about:blank → simple page)
await session.page.goto('data:text/html,<h1>MCP Preflight</h1>');
const pageTitle = await session.page.title();
const pageUrl = session.page.url();
const navOk = pageUrl.startsWith('data:') || pageUrl !== 'about:blank';
report('Navigation', navOk, `url=${pageUrl.slice(0, 60)}`);

// Grounded summary check — use runAction which produces the summary
const gotoResult = await session.runAction('goto', { url: 'data:text/html,<p>grounded-summary-check</p>' });
const hasSummary = typeof gotoResult.summary === 'string' && gotoResult.summary.length > 0;
report('Grounded summary', hasSummary, gotoResult.summary ? gotoResult.summary.slice(0, 80) : 'missing');

Copilot AI Apr 18, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preflight uses session.runAction('goto', { url: 'data:text/html,...' }), but EdgeSession.handleGoto only recognizes http/https and will prefix other schemes with https://, making this navigation fail. Since the check only asserts summary exists (not gotoResult.status === 'ok'), the preflight can report success while the core goto verification is failing. Use an http(s) URL that EdgeSession supports (or update EdgeSession to allow data:/file:/about:), and assert gotoResult.status/gotoResult.verification.ok in the preflight.

Copilot uses AI. Check for mistakes.
Comment on lines +93 to +113
before_size = before.stat().st_size
after_size = after.stat().st_size
size_diff = after_size - before_size
size_changed = abs(size_diff) > 1024

# Byte-level comparison
before_bytes = before.read_bytes()
after_bytes = after.read_bytes()
identical = before_bytes == after_bytes

return self._json_response({
"status": "ok",
"identical": identical,
"before": {"path": before_path, "size": before_size},
"after": {"path": after_path, "size": after_size},
"size_diff_bytes": size_diff,
"visually_changed": not identical,
"significant_size_change": size_changed,
"before_base64": self._encode_image_to_base64(before) if not identical else None,
"after_base64": self._encode_image_to_base64(after) if not identical else None,
}, max_chars=8 * 1024 * 1024)

Copilot AI Apr 18, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vision_compare reads both files fully into memory and, when different, base64-encodes and returns both images in the JSON response. Without any size cap/option to omit base64 payloads, this can create very large responses (and memory spikes) for typical screenshots, making the tool unreliable in practice. Add a max-bytes guard (similar to vision_encode) and/or an include_base64 flag defaulting to false with a small preview/hash-based diff by default.

Copilot uses AI. Check for mistakes.
Comment thread tests/smoke-chain.test.js
Comment on lines +19 to +21
const locator = {
count: mock.fn(async () => options.locatorCount || 1),
first: mock.fn(() => locator),

Copilot AI Apr 18, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the mock locator, count uses options.locatorCount || 1, which makes it impossible to simulate a real 0 match count (0 becomes 1). Using nullish coalescing (??) would allow tests to cover the zero-match path accurately.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants