feat: agent-powered E2E test subset selection for PRs by malhotra5 · Pull Request #1267 · OpenHands/agent-canvas

malhotra5 · 2026-06-08T21:20:08Z

A human has tested these changes.

Why

Running the full mock-LLM E2E suite (~14 specs) on every PR push takes significant CI time. Many PRs only touch a narrow area of the codebase and don't need the full suite. There was no way to parameterize the E2E workflow or intelligently select a subset of tests.

Summary

Modified mock-llm-e2e.yml: Added workflow_dispatch inputs (test_specs, test_grep, pr_number) so the workflow can be triggered with a filtered subset. Fully backward-compatible — pull_request events still run the full suite. All PR-number references now use EFFECTIVE_PR_NUMBER to support both trigger types.
New scripts/select-e2e-tests.py: Python script that uses the OpenHands SDK LLM class to intelligently map changed files to relevant test specs. Includes a deterministic heuristic fallback when the LLM is unavailable. Outputs JSON with the selected specs, reason, and mode (llm/heuristic/full).
New agent-e2e-selector.yml: Workflow triggered by the smart-e2e label or manual dispatch. Analyzes PR changed files, runs the selector script, dispatches Mock-LLM E2E Tests with the chosen subset, and posts a summary comment to the PR.

How to Test

Manual dispatch (direct E2E subset):

gh workflow run "Mock-LLM E2E Tests" \
  --ref <branch> \
  -f test_specs="mock-llm-conversation.spec.ts,mock-llm-automation.spec.ts" \
  -f pr_number=<pr_number>

Agent selector (manual dispatch):

gh workflow run "Agent E2E Selector" -f pr_number=<pr_number>

Agent selector (label-triggered):
Add the smart-e2e label to any same-repo PR.

Test the Python script locally (heuristic mode, no LLM needed):

echo -e "src/components/features/onboarding/onboarding-modal.tsx" | python3 scripts/select-e2e-tests.py
# → selects onboarding specs

echo -e "src/api/automation-service/automation-service.api.ts" | python3 scripts/select-e2e-tests.py
# → selects automation specs

echo -e "README.md\n.github/workflows/ci.yml" | python3 scripts/select-e2e-tests.py
# → mode=full, no E2E needed (docs/CI only)

Type

Notes

LLM credentials: The workflow uses secrets.LLM_API_KEY (already available in this repo for live E2E) with the LLM proxy at https://llm-proxy.app.all-hands.dev and gpt-4.1-mini for cost-efficient selection.
Backward compatible: The pull_request trigger on mock-llm-e2e.yml is unchanged — the full suite still runs on every PR push as a merge gate. The agent selector is an opt-in faster-feedback path.
Security: The agent selector only runs for same-repo PRs (fork PRs are excluded to prevent secret exposure).
Fallback chain: LLM → heuristic path-prefix mapping → full suite. If the LLM call fails for any reason, the deterministic heuristic kicks in automatically.

This PR was created by an AI agent (OpenHands) on behalf of the user.

@malhotra5 can click here to continue refining the PR

🐳 Docker images for this PR

• GHCR package: https://github.com/OpenHands/agent-canvas/pkgs/container/agent-canvas

Component	Value
Image	`ghcr.io/openhands/agent-canvas`
Architectures	amd64, arm64
Agent Server	`ghcr.io/openhands/agent-server:1.26.0-python`
Automation	`openhands-automation==1.0.0a6`
Commit	`8d1976bd6bee4810375f52737fc310951b9b03bf`

Pull (multi-arch manifest)

# Multi-arch manifest — Docker automatically pulls the correct architecture
docker pull ghcr.io/openhands/agent-canvas:sha-8d1976b

Run

docker run -it --rm \
  -p 8000:8000 \
  ghcr.io/openhands/agent-canvas:sha-8d1976b

All tags pushed for this build

ghcr.io/openhands/agent-canvas:sha-8d1976b-amd64
ghcr.io/openhands/agent-canvas:agent-e2e-test-selector-amd64
ghcr.io/openhands/agent-canvas:pr-1267-amd64
ghcr.io/openhands/agent-canvas:sha-8d1976b-arm64
ghcr.io/openhands/agent-canvas:agent-e2e-test-selector-arm64
ghcr.io/openhands/agent-canvas:pr-1267-arm64
ghcr.io/openhands/agent-canvas:sha-8d1976b
ghcr.io/openhands/agent-canvas:agent-e2e-test-selector
ghcr.io/openhands/agent-canvas:pr-1267

About Multi-Architecture Support

Each tag (e.g., sha-8d1976b) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., sha-8d1976b-amd64) are also available if needed

Add an LLM-based CI workflow that analyzes PR file changes and selects the most relevant mock-LLM E2E test specs to run, then triggers the E2E workflow with only that subset. Changes: - .github/workflows/mock-llm-e2e.yml: Add workflow_dispatch inputs (test_specs, test_grep, pr_number) so the workflow can be triggered with a filtered subset. Fully backward-compatible — pull_request events still run the full suite. Use EFFECTIVE_PR_NUMBER for comment/artifact steps to support both trigger types. - scripts/select-e2e-tests.py: Python script using OpenHands SDK LLM to intelligently map changed files to relevant specs, with a deterministic heuristic fallback when LLM is unavailable. - .github/workflows/agent-e2e-selector.yml: New workflow triggered by the 'smart-e2e' label or manual dispatch. Analyzes PR files, runs the selector, and dispatches mock-llm-e2e with the chosen subset. Posts a summary comment to the PR. Co-authored-by: openhands <openhands@all-hands.dev>

vercel · 2026-06-08T21:20:14Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agent-canvas	Ready	Preview, Comment	Jun 8, 2026 9:57pm

github-actions · 2026-06-08T21:20:18Z

PR Artifacts Notice

This PR contains a .pr/ directory with PR-specific artifacts. This directory will be automatically removed when the PR is approved.

Fork PRs require manual cleanup before merging.

github-actions · 2026-06-08T21:28:14Z

✅ Mock-LLM E2E Tests

43/43 passed

Commit: 37310ea6 · Workflow run · Test artifacts

Status	Test	Duration
✅	mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 1: configure ACP agent via Settings → Agent UI	13.7s
✅	mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 2: reload and verify ACP settings are persisted in UI	5.6s
✅	mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 3: start ACP conversation and verify agent reply	6.2s
✅	mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 4: resume ACP conversation from sidebar after navigating away	5.7s
✅	mock-llm-auth-modes.spec.ts › auth mode: fresh install with runtime-injected key › reaches the onboarding modal without pre-seeded localStorage	1.3s
✅	mock-llm-auth-modes.spec.ts › auth mode: non-public key rotation › recovers when localStorage has a stale session API key	5.4s
✅	mock-llm-auth-modes.spec.ts › auth mode: public gate › shows the auth screen when no key is configured	1.1s
✅	mock-llm-auth-modes.spec.ts › auth mode: public gate › rejects an incorrect key with an inline error	1.3s
✅	mock-llm-auth-modes.spec.ts › auth mode: public gate › allows access after pasting the correct key	1.6s
✅	mock-llm-auth-modes.spec.ts › auth mode: public gate › skips auth screen for returning user with valid stored key	709ms
✅	mock-llm-auth-modes.spec.ts › auth mode: public gate › re-prompts when the server rotates its key (stale localStorage)	1.4s
✅	mock-llm-automation.spec.ts › mock-LLM automation lifecycle › step 1: setup LLM profile and register automation trajectory	7.7s
✅	mock-llm-automation.spec.ts › mock-LLM automation lifecycle › step 2: create automation and dispatch run via the UI	28.3s
✅	mock-llm-automation.spec.ts › mock-LLM automation lifecycle › step 3: verify automation and run on the automations page	6.1s
✅	mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 1: create an LLM profile pointing at the mock LLM server	6.2s
✅	mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 2: activate the mock-llm profile and verify settings API	6.1s
✅	mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 3: run a conversation with the mock LLM	6.4s
✅	mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 4: resume conversation from sidebar after navigating away	5.7s
✅	mock-llm-cross-connect.spec.ts › cross-connect: frontend-only → backend-only › frontend-only connects to a separate backend-only instance	15.8s
✅	mock-llm-cross-connect.spec.ts › cross-connect: frontend-only → multiple backends › connects to two separate backends and switches between them	20.5s
✅	mock-llm-image-upload.spec.ts › mock-LLM image upload › attaching an image embeds it as base64 in the LLM completion call	13.4s
✅	mock-llm-model-switch.spec.ts › mock-LLM /model slash command › step 1: configure LLM, create switch-target profile, register trajectory	12.9s
✅	mock-llm-model-switch.spec.ts › mock-LLM /model slash command › step 2: start conversation, switch profile via /model, verify switch	6.8s
✅	mock-llm-onboarding-happy-path.spec.ts › onboarding happy path › completes the full onboarding flow and launches a conversation	4.4s
✅	mock-llm-onboarding-regressions.spec.ts › onboarding recent regressions › keeps the modal open on backdrop click and Escape	1.3s
✅	mock-llm-onboarding-regressions.spec.ts › onboarding recent regressions › defaults the LLM setup step to OpenAI GPT-5.5	1.6s
✅	mock-llm-partial-stack.spec.ts › partial stack: --frontend-only › serves the frontend but returns 503 for backend routes	7.4s
✅	mock-llm-partial-stack.spec.ts › partial stack: --backend-only › serves backend APIs but returns 503 for the frontend root	13.1s
✅	mock-llm-partial-stack.spec.ts › partial stack: port conflict › fails with a clear error when the ingress port is occupied	122ms
✅	mock-llm-partial-stack.spec.ts › partial stack: port conflict › starts successfully on a free port after a conflict	6.0s
✅	mock-llm-preset-automation.spec.ts › preset automation → slash command conversation › automation card sends the correct slash command to a conversation	15.9s
✅	mock-llm-preset-automation.spec.ts › preset automation → slash command conversation › direct slash command from home page triggers skill activation	13.3s
✅	mock-llm-profile-management.spec.ts › active profile deletion + reconciliation › active profile is deletable and reconciliation activates another profile	8.4s
✅	mock-llm-profile-management.spec.ts › same-model profile identity › chat header shows the correct profile when two profiles share the same model	15.0s
✅	mock-llm-profile-management.spec.ts › litellm_proxy proxy base_url preservation › re-saving a litellm_proxy profile from Basic view preserves the proxy base_url	7.7s
✅	mock-llm-skills.spec.ts › skill loading: project, user, and deletion › project skill in workspace/.agents/skills/ triggers on matching keyword	14.3s
✅	mock-llm-skills.spec.ts › skill loading: project, user, and deletion › user skill in ~/.openhands/skills/ triggers on matching keyword	13.3s
✅	mock-llm-skills.spec.ts › skill loading: project, user, and deletion › deleting a user skill removes it from subsequent conversations	13.2s
✅	mock-llm-ui-regressions.spec.ts › UI regressions › scopes standalone styles to the agent-server-ui shell	1.3s
✅	mock-llm-ui-regressions.spec.ts › UI regressions › renders critic results on agent messages and finish actions	1.4s
✅	mock-llm-ui-regressions.spec.ts › UI regressions › loads older events when scrolling up	1.5s
✅	mock-llm-ui-regressions.spec.ts › UI regressions › selected workspace persists after navigating away and returning	2.0s
✅	mock-llm-ui-regressions.spec.ts › UI regressions › cleared sessionStorage yields empty workspace selection	1.3s

_{Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)}

…/gpt-5.1 - mock-llm-e2e.yml: Add workflow_call trigger with matching inputs so the workflow can be called as a reusable workflow (no PAT needed). - agent-e2e-selector.yml: Replace gh workflow run dispatch (which requires a PAT with actions:write scope) with a workflow_call job. The select job outputs feed directly into the run-e2e job. - select-e2e-tests.py: Change default model from litellm_proxy/openai/ gpt-4.1-mini to openhands/gpt-5.1. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-06-08T21:28:41Z

🛑 Mock-LLM Docker E2E Test Results

12/12 passed · ⚠️ 31 not run (process killed at 12/43)

Commit: 37310ea6 · Workflow run · Test artifacts

Status	Test	Duration
✅	chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 1: configure ACP agent via Settings → Agent UI	13.8s
✅	chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 2: reload and verify ACP settings are persisted in UI	5.6s
✅	chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 3: start ACP conversation and verify agent reply	6.7s
✅	chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 4: resume ACP conversation from sidebar after navigating away	5.8s
✅	chromium › mock-llm-auth-modes.spec.ts › auth mode: fresh install with runtime-injected key › reaches the onboarding modal without pre-seeded localStorage	1.4s
✅	chromium › mock-llm-auth-modes.spec.ts › auth mode: non-public key rotation › recovers when localStorage has a stale session API key	5.3s
✅	chromium › mock-llm-auth-modes.spec.ts › auth mode: public gate › shows the auth screen when no key is configured	1.2s
✅	chromium › mock-llm-auth-modes.spec.ts › auth mode: public gate › rejects an incorrect key with an inline error	1.3s
✅	chromium › mock-llm-auth-modes.spec.ts › auth mode: public gate › allows access after pasting the correct key	4.7s
✅	chromium › mock-llm-auth-modes.spec.ts › auth mode: public gate › skips auth screen for returning user with valid stored key	785ms
✅	chromium › mock-llm-auth-modes.spec.ts › auth mode: public gate › re-prompts when the server rotates its key (stale localStorage)	1.5s
✅	chromium › mock-llm-automation.spec.ts › mock-LLM automation lifecycle › step 1: setup LLM profile and register automation trajectory	7.5s

_{Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)}

github-actions · 2026-06-08T21:30:10Z

🤖 Agent E2E Test Selector


Mode	`full`
Files analyzed	3
Reason	Heuristic could not narrow: 1 source files changed.

Selected specs:
Full suite (all specs)

Running via: Mock-LLM E2E Tests

github-actions · 2026-06-08T21:30:29Z

⚠️ Mock-LLM E2E Tests

0/0 passed

Commit: 6368db89 · Workflow run

Status	Test	Duration

_{Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)}

… point - select-e2e-tests.py: Remove heuristic_select() and all static path-prefix mapping. Always use the LLM — if LLM_API_KEY is missing the script fails loudly instead of silently degrading. - mock-llm-e2e.yml: Remove pull_request trigger. E2E tests no longer run the full suite on every PR commit. They are now invoked only via workflow_call (from agent-e2e-selector) or workflow_dispatch. - agent-e2e-selector.yml: Trigger on pull_request [opened, synchronize, reopened] — this is now the primary entry point for PR-driven E2E. The LLM picks the relevant subset and the E2E workflow runs only those specs. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-06-08T21:34:43Z

🛑 Mock-LLM Docker E2E Test Results

3/3 passed · ⚠️ 40 not run (process killed at 3/43)

Commit: 6368db89 · Workflow run · Test artifacts

Status	Test	Duration
✅	chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 1: configure ACP agent via Settings → Agent UI	14.8s
✅	chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 2: reload and verify ACP settings are persisted in UI	5.6s
✅	chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 3: start ACP conversation and verify agent reply	6.8s

_{Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)}

github-actions · 2026-06-08T21:34:45Z

🛑 Mock-LLM E2E Tests

18/18 passed · ⚠️ 25 not run (process killed at 18/43)

Commit: 6368db89 · Workflow run · Test artifacts

Status	Test	Duration
✅	chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 1: configure ACP agent via Settings → Agent UI	13.3s
✅	chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 2: reload and verify ACP settings are persisted in UI	5.5s
✅	chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 3: start ACP conversation and verify agent reply	6.2s
✅	chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 4: resume ACP conversation from sidebar after navigating away	5.6s
✅	chromium › mock-llm-auth-modes.spec.ts › auth mode: fresh install with runtime-injected key › reaches the onboarding modal without pre-seeded localStorage	1.3s
✅	chromium › mock-llm-auth-modes.spec.ts › auth mode: non-public key rotation › recovers when localStorage has a stale session API key	5.3s
✅	chromium › mock-llm-auth-modes.spec.ts › auth mode: public gate › shows the auth screen when no key is configured	1.1s
✅	chromium › mock-llm-auth-modes.spec.ts › auth mode: public gate › rejects an incorrect key with an inline error	1.4s
✅	chromium › mock-llm-auth-modes.spec.ts › auth mode: public gate › allows access after pasting the correct key	1.6s
✅	chromium › mock-llm-auth-modes.spec.ts › auth mode: public gate › skips auth screen for returning user with valid stored key	755ms
✅	chromium › mock-llm-auth-modes.spec.ts › auth mode: public gate › re-prompts when the server rotates its key (stale localStorage)	1.4s
✅	chromium › mock-llm-automation.spec.ts › mock-LLM automation lifecycle › step 1: setup LLM profile and register automation trajectory	7.0s
✅	chromium › mock-llm-automation.spec.ts › mock-LLM automation lifecycle › step 2: create automation and dispatch run via the UI	30.2s
✅	chromium › mock-llm-automation.spec.ts › mock-LLM automation lifecycle › step 3: verify automation and run on the automations page	6.0s
✅	chromium › mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 1: create an LLM profile pointing at the mock LLM server	6.0s
✅	chromium › mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 2: activate the mock-llm profile and verify settings API	6.0s
✅	chromium › mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 3: run a conversation with the mock LLM	6.3s
✅	chromium › mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 4: resume conversation from sidebar after navigating away	5.6s

_{Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)}

- select-e2e-tests.py: Rewritten to use Agent + Conversation instead of raw LLM.completion(). Adds a CIVisualizer (streams every event to stderr for CI log visibility) and a capture_event callback (collects agent messages for parsing). The agent outputs a structured <TEST_SELECTION> block that gets parsed for the spec list. Empty specs now means 'skip E2E' not 'run full suite'. - agent-e2e-selector.yml: run-e2e job now has an if-guard that skips mock-llm-e2e entirely when the agent returns no specs. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-06-08T21:39:42Z

⚠️ Mock-LLM Docker E2E Test Results

0/0 passed

Commit: 7a8ab8af · Workflow run

Status	Test	Duration

_{Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)}

The capture_event callback collects model_dump_json() from every event. The agent's response text lives at event.llm_message.content[].text, not at event.message or event.text. Added extract_text_from_dumps() to properly walk the JSON structure and pull out all text fragments before searching for the <TEST_SELECTION> tag. Also improved the ValueError message to include the extracted text for easier debugging if parsing still fails. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-06-08T21:41:56Z

⚠️ Mock-LLM Docker E2E Test Results

0/0 passed

Commit: fffa8572 · Workflow run

Status	Test	Duration

_{Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)}

- select-e2e-tests.py: Agent now has terminal + file_editor tools and runs against the repo checkout (WORKSPACE env var, defaults to cwd). It can read source files and test specs to make informed decisions. Fixed two parsing bugs: (1) only collect agent-source events (skip user prompt to avoid matching the template), (2) use re.findall and take the last match as an extra safety net. Increased visualizer dump to 800 chars for better CI log visibility. - agent-e2e-selector.yml: Install openhands-tools alongside openhands-sdk (needed for TerminalTool/FileEditorTool). Suppress SDK banner. Bumped timeout to 10 min for agent tool-call iterations. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-06-08T21:45:53Z

⚠️ Mock-LLM Docker E2E Test Results

0/0 passed

Commit: 2b1add15 · Workflow run

Status	Test	Duration

_{Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)}

Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-06-08T21:47:24Z

⚠️ Mock-LLM Docker E2E Test Results

0/0 passed

Commit: 81495093 · Workflow run

Status	Test	Duration

_{Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)}

The TEST_SELECTION block lands in the FinishObservation event (source='environment'), not in agent-source events. Removed the source filter from the callback and replaced the flat key-lookup extractor with a recursive _collect_text() that walks the entire JSON tree collecting every 'text' and 'message' string value. parse_selection() already uses re.findall + last-match to avoid matching the prompt template. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-06-08T21:49:21Z

⚠️ Mock-LLM Docker E2E Test Results

0/0 passed

Commit: 64702ac3 · Workflow run

Status	Test	Duration

_{Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)}

github-actions · 2026-06-08T21:49:52Z

🤖 Agent E2E Test Selector


Mode	`llm`
Files analyzed	3
Reason	Only CI workflows and test-selection script changed, not app behavior.

Selected specs:

``

Running via: Mock-LLM E2E Tests

Instead of wrangling agent event dumps / regex / recursive JSON walking, the agent now writes its result to a temp JSON file. After the conversation finishes, we just read that file. Removed: OUTPUT_TAG, capture_event callback, _collect_text, extract_all_text, parse_selection, re import. The CIVisualizer still streams all events to stderr for CI log visibility. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-06-08T21:52:05Z

⚠️ Mock-LLM Docker E2E Test Results

0/0 passed

Commit: dee20cfd · Workflow run

Status	Test	Duration

_{Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)}

github-actions · 2026-06-08T21:53:06Z

🤖 Agent E2E Test Selector


Mode	`llm`
Files analyzed	3
Reason	Only CI workflows and E2E selection script changed; no product behavior affected.

Selected specs:

``

Running via: Mock-LLM E2E Tests

Replaced the 14-entry SPEC_CATALOG dict with discover_specs() which globs tests/e2e/mock-llm/*.spec.ts at runtime. The agent gets the list of available spec filenames and is told to read their source to understand what each one tests. New specs are picked up automatically without any script changes. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-06-08T21:56:33Z

⚠️ Mock-LLM Docker E2E Test Results

0/0 passed

Commit: d6581db6 · Workflow run

Status	Test	Duration

_{Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)}

github-actions · 2026-06-08T21:57:18Z

🤖 Agent E2E Test Selector


Mode	`llm`
Files analyzed	3
Reason	Only E2E selector workflows and selection script changed; app behavior under test specs is unaffected.

Selected specs:

``

Running via: Mock-LLM E2E Tests

github-actions · 2026-06-08T22:06:58Z

📸 Snapshot Test Report

Warning

Snapshot comparison step crashed (timeout, OOM, or runner error) — diff results below may be incomplete or absent.
Check the CI logs for the full error output (look for the "Run snapshot comparison" step).

❌ 1 snapshot differ from the main branch baseline. Add the update-snapshots label to acknowledge intentional changes.

Category	Count
🔴 Changed	1
🆕 New	0
✅ Unchanged	73
Total	74

How to resolve:

Unintentional diffs — the baselines on main may have moved since this branch was created. Merge the latest main into this branch and re-run CI.

Intentional changes — add the update-snapshots label. CI will pass and the new screenshots become the baseline when this PR merges.

🔴 Changed snapshots (1)

`backends-extended`

backend-dropdown-two-backends

Expected (main)	Actual (PR)	Diff

✅ Unchanged snapshots (73)

archived-conversation

conversation-panel-with-archived-badges
conversation-view-archived
conversation-view-sandbox-error

automations

automations-delete-modal
automations-list-active-inactive
automations-no-automations
automations-search-no-results

backends-extended

backend-add-blank-disabled
backend-add-cloud-advanced-open
backend-add-cloud-no-key-disabled
backend-add-cloud-with-key-enabled
backend-add-form-partially-filled
backend-add-invalid-url-disabled
backend-add-local-ready
backend-add-name-only-disabled
backend-add-two-column-layout
backend-add-whitespace-host-disabled
backend-after-switch
backend-cancel-nothing-saved
backend-edit-prefilled
backend-manage-after-removal
backend-manage-two-listed
backend-remove-cancelled
backend-remove-confirmation
backend-switch-overlay

backends

backend-add-modal
backend-manage-modal
backend-selector-open

changes-tab

changes-deleted-file
changes-diff-viewer
changes-empty

collapsible-thinking

reasoning-content-collapsed
reasoning-content-expanded
think-action-collapsed
think-action-expanded

mcp-page

mcp-custom-server-1-editor-open
mcp-custom-server-2-url-filled
mcp-custom-server-3-all-filled
mcp-custom-server-4-installed
mcp-custom-server-editor
mcp-empty-installed
mcp-search-filtered
mcp-slack-install-1-marketplace
mcp-slack-install-2-modal
mcp-slack-install-3-filled
mcp-slack-install-4-installed

onboarding

onboarding-step-0-check-backend
onboarding-step-1-choose-agent
onboarding-step-2-setup-llm
onboarding-step-3-say-hello

projects-workspace-browser

projects-workspace-browser

settings-page

add-backend-modal
analytics-consent-modal
home-screen
settings-app-page
settings-page

settings-secrets

secrets-add-form-filled
secrets-add-form
secrets-after-save
secrets-delete-confirm
secrets-list

settings-verification

condenser-settings
verification-settings-critic-enabled
verification-settings-off
verification-settings-on

sidebar

sidebar-collapsed
sidebar-conversation-panel
sidebar-filter-menu

skills-page

skills-empty
skills-loaded
skills-no-match
skills-search-filtered
skills-type-filter

Generated by the Snapshot Tests workflow. This comment was created by an AI agent (OpenHands) on behalf of the repo maintainers.

github-actions · 2026-06-08T22:08:15Z

🔶 Mock-LLM Docker E2E Test Results

38/43 passed · 5 skipped

Commit: 8d1976bd · Workflow run · Test artifacts

Status	Test	Duration
✅	mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 1: configure ACP agent via Settings → Agent UI	15.3s
✅	mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 2: reload and verify ACP settings are persisted in UI	5.5s
✅	mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 3: start ACP conversation and verify agent reply	6.7s
✅	mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 4: resume ACP conversation from sidebar after navigating away	5.7s
✅	mock-llm-auth-modes.spec.ts › auth mode: fresh install with runtime-injected key › reaches the onboarding modal without pre-seeded localStorage	1.3s
✅	mock-llm-auth-modes.spec.ts › auth mode: non-public key rotation › recovers when localStorage has a stale session API key	5.3s
✅	mock-llm-auth-modes.spec.ts › auth mode: public gate › shows the auth screen when no key is configured	1.1s
✅	mock-llm-auth-modes.spec.ts › auth mode: public gate › rejects an incorrect key with an inline error	1.4s
✅	mock-llm-auth-modes.spec.ts › auth mode: public gate › allows access after pasting the correct key	1.7s
✅	mock-llm-auth-modes.spec.ts › auth mode: public gate › skips auth screen for returning user with valid stored key	731ms
✅	mock-llm-auth-modes.spec.ts › auth mode: public gate › re-prompts when the server rotates its key (stale localStorage)	1.4s
✅	mock-llm-automation.spec.ts › mock-LLM automation lifecycle › step 1: setup LLM profile and register automation trajectory	7.3s
✅	mock-llm-automation.spec.ts › mock-LLM automation lifecycle › step 2: create automation and dispatch run via the UI	32.4s
✅	mock-llm-automation.spec.ts › mock-LLM automation lifecycle › step 3: verify automation and run on the automations page	6.1s
✅	mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 1: create an LLM profile pointing at the mock LLM server	8.6s
✅	mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 2: activate the mock-llm profile and verify settings API	6.1s
✅	mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 3: run a conversation with the mock LLM	6.3s
✅	mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 4: resume conversation from sidebar after navigating away	5.7s
⏭️	mock-llm-cross-connect.spec.ts › cross-connect: frontend-only → backend-only › frontend-only connects to a separate backend-only instance	188ms
⏭️	mock-llm-cross-connect.spec.ts › cross-connect: frontend-only → multiple backends › connects to two separate backends and switches between them	182ms
✅	mock-llm-image-upload.spec.ts › mock-LLM image upload › attaching an image embeds it as base64 in the LLM completion call	14.8s
✅	mock-llm-model-switch.spec.ts › mock-LLM /model slash command › step 1: configure LLM, create switch-target profile, register trajectory	14.7s
✅	mock-llm-model-switch.spec.ts › mock-LLM /model slash command › step 2: start conversation, switch profile via /model, verify switch	6.6s
✅	mock-llm-onboarding-happy-path.spec.ts › onboarding happy path › completes the full onboarding flow and launches a conversation	3.4s
✅	mock-llm-onboarding-regressions.spec.ts › onboarding recent regressions › keeps the modal open on backdrop click and Escape	1.3s
✅	mock-llm-onboarding-regressions.spec.ts › onboarding recent regressions › defaults the LLM setup step to OpenAI GPT-5.5	1.6s
⏭️	mock-llm-partial-stack.spec.ts › partial stack: --frontend-only › serves the frontend but returns 503 for backend routes	187ms
✅	mock-llm-partial-stack.spec.ts › partial stack: --backend-only › serves backend APIs but returns 503 for the frontend root	25.1s
⏭️	mock-llm-partial-stack.spec.ts › partial stack: port conflict › fails with a clear error when the ingress port is occupied	1ms
⏭️	mock-llm-partial-stack.spec.ts › partial stack: port conflict › starts successfully on a free port after a conflict	7ms
✅	mock-llm-preset-automation.spec.ts › preset automation → slash command conversation › automation card sends the correct slash command to a conversation	16.0s
✅	mock-llm-preset-automation.spec.ts › preset automation → slash command conversation › direct slash command from home page triggers skill activation	13.2s
✅	mock-llm-profile-management.spec.ts › active profile deletion + reconciliation › active profile is deletable and reconciliation activates another profile	8.4s
✅	mock-llm-profile-management.spec.ts › same-model profile identity › chat header shows the correct profile when two profiles share the same model	14.9s
✅	mock-llm-profile-management.spec.ts › litellm_proxy proxy base_url preservation › re-saving a litellm_proxy profile from Basic view preserves the proxy base_url	7.8s
✅	mock-llm-skills.spec.ts › skill loading: project, user, and deletion › project skill in workspace/.agents/skills/ triggers on matching keyword	16.6s
✅	mock-llm-skills.spec.ts › skill loading: project, user, and deletion › user skill in ~/.openhands/skills/ triggers on matching keyword	13.2s
✅	mock-llm-skills.spec.ts › skill loading: project, user, and deletion › deleting a user skill removes it from subsequent conversations	13.1s
✅	mock-llm-ui-regressions.spec.ts › UI regressions › scopes standalone styles to the agent-server-ui shell	945ms
✅	mock-llm-ui-regressions.spec.ts › UI regressions › renders critic results on agent messages and finish actions	1.4s
✅	mock-llm-ui-regressions.spec.ts › UI regressions › loads older events when scrolling up	1.6s
✅	mock-llm-ui-regressions.spec.ts › UI regressions › selected workspace persists after navigating away and returning	2.5s
✅	mock-llm-ui-regressions.spec.ts › UI regressions › cleared sessionStorage yields empty workspace selection	1.1s

_{Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)}

vercel Bot deployed to Preview June 8, 2026 21:20 View deployment

malhotra5 added the smart-e2e Have an agent choose which e2e tests to run label Jun 8, 2026

vercel Bot deployed to Preview June 8, 2026 21:29 View deployment

malhotra5 added smart-e2e Have an agent choose which e2e tests to run and removed smart-e2e Have an agent choose which e2e tests to run labels Jun 8, 2026

github-actions Bot added a commit that referenced this pull request Jun 8, 2026

snapshot images for PR #1267 run 27168080821

aea2a1a

vercel Bot deployed to Preview June 8, 2026 21:35 View deployment

github-actions Bot added a commit that referenced this pull request Jun 8, 2026

snapshot images for PR #1267 run 27168388153

f617dd3

vercel Bot deployed to Preview June 8, 2026 21:40 View deployment

vercel Bot deployed to Preview June 8, 2026 21:42 View deployment

vercel Bot deployed to Preview June 8, 2026 21:46 View deployment

fix: use correct SDK param name max_iteration_per_run

64702ac

Co-authored-by: openhands <openhands@all-hands.dev>

vercel Bot deployed to Preview June 8, 2026 21:47 View deployment

vercel Bot deployed to Preview June 8, 2026 21:49 View deployment

vercel Bot deployed to Preview June 8, 2026 21:52 View deployment

vercel Bot deployed to Preview June 8, 2026 21:57 View deployment

github-actions Bot added a commit that referenced this pull request Jun 8, 2026

snapshot images for PR #1267 run 27169512066

8338a15

Conversation

malhotra5 commented Jun 8, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Summary

How to Test

Type

Notes

Uh oh!

vercel Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

✅ Mock-LLM E2E Tests

Uh oh!

github-actions Bot commented Jun 8, 2026

🛑 Mock-LLM Docker E2E Test Results

Uh oh!

github-actions Bot commented Jun 8, 2026

🤖 Agent E2E Test Selector

Uh oh!

github-actions Bot commented Jun 8, 2026

⚠️ Mock-LLM E2E Tests

Uh oh!

github-actions Bot commented Jun 8, 2026

🛑 Mock-LLM Docker E2E Test Results

Uh oh!

github-actions Bot commented Jun 8, 2026

🛑 Mock-LLM E2E Tests

Uh oh!

github-actions Bot commented Jun 8, 2026

⚠️ Mock-LLM Docker E2E Test Results

Uh oh!

github-actions Bot commented Jun 8, 2026

⚠️ Mock-LLM Docker E2E Test Results

Uh oh!

github-actions Bot commented Jun 8, 2026

⚠️ Mock-LLM Docker E2E Test Results

Uh oh!

github-actions Bot commented Jun 8, 2026

⚠️ Mock-LLM Docker E2E Test Results

Uh oh!

github-actions Bot commented Jun 8, 2026

⚠️ Mock-LLM Docker E2E Test Results

Uh oh!

github-actions Bot commented Jun 8, 2026

🤖 Agent E2E Test Selector

Uh oh!

github-actions Bot commented Jun 8, 2026

⚠️ Mock-LLM Docker E2E Test Results

Uh oh!

github-actions Bot commented Jun 8, 2026

🤖 Agent E2E Test Selector

Uh oh!

github-actions Bot commented Jun 8, 2026

⚠️ Mock-LLM Docker E2E Test Results

Uh oh!

github-actions Bot commented Jun 8, 2026

🤖 Agent E2E Test Selector

Uh oh!

github-actions Bot commented Jun 8, 2026

📸 Snapshot Test Report

backends-extended

Uh oh!

github-actions Bot commented Jun 8, 2026

🔶 Mock-LLM Docker E2E Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

malhotra5 commented Jun 8, 2026 •

edited by github-actions Bot

Loading

vercel Bot commented Jun 8, 2026 •

edited

Loading

`backends-extended`