Skip to content

feat: agent-powered E2E test subset selection for PRs#1267

Draft
malhotra5 wants to merge 10 commits into
mainfrom
agent-e2e-test-selector
Draft

feat: agent-powered E2E test subset selection for PRs#1267
malhotra5 wants to merge 10 commits into
mainfrom
agent-e2e-test-selector

Conversation

@malhotra5

@malhotra5 malhotra5 commented Jun 8, 2026

Copy link
Copy Markdown
Member
  • A human has tested these changes.

Why

Running the full mock-LLM E2E suite (~14 specs) on every PR push takes significant CI time. Many PRs only touch a narrow area of the codebase and don't need the full suite. There was no way to parameterize the E2E workflow or intelligently select a subset of tests.

Summary

  • Modified mock-llm-e2e.yml: Added workflow_dispatch inputs (test_specs, test_grep, pr_number) so the workflow can be triggered with a filtered subset. Fully backward-compatible — pull_request events still run the full suite. All PR-number references now use EFFECTIVE_PR_NUMBER to support both trigger types.
  • New scripts/select-e2e-tests.py: Python script that uses the OpenHands SDK LLM class to intelligently map changed files to relevant test specs. Includes a deterministic heuristic fallback when the LLM is unavailable. Outputs JSON with the selected specs, reason, and mode (llm/heuristic/full).
  • New agent-e2e-selector.yml: Workflow triggered by the smart-e2e label or manual dispatch. Analyzes PR changed files, runs the selector script, dispatches Mock-LLM E2E Tests with the chosen subset, and posts a summary comment to the PR.

How to Test

Manual dispatch (direct E2E subset):

gh workflow run "Mock-LLM E2E Tests" \
  --ref <branch> \
  -f test_specs="mock-llm-conversation.spec.ts,mock-llm-automation.spec.ts" \
  -f pr_number=<pr_number>

Agent selector (manual dispatch):

gh workflow run "Agent E2E Selector" -f pr_number=<pr_number>

Agent selector (label-triggered):
Add the smart-e2e label to any same-repo PR.

Test the Python script locally (heuristic mode, no LLM needed):

echo -e "src/components/features/onboarding/onboarding-modal.tsx" | python3 scripts/select-e2e-tests.py
# → selects onboarding specs

echo -e "src/api/automation-service/automation-service.api.ts" | python3 scripts/select-e2e-tests.py
# → selects automation specs

echo -e "README.md\n.github/workflows/ci.yml" | python3 scripts/select-e2e-tests.py
# → mode=full, no E2E needed (docs/CI only)

Type

  • Bug fix
  • Feature
  • Refactor
  • Breaking change
  • Docs / chore

Notes

  • LLM credentials: The workflow uses secrets.LLM_API_KEY (already available in this repo for live E2E) with the LLM proxy at https://llm-proxy.app.all-hands.dev and gpt-4.1-mini for cost-efficient selection.
  • Backward compatible: The pull_request trigger on mock-llm-e2e.yml is unchanged — the full suite still runs on every PR push as a merge gate. The agent selector is an opt-in faster-feedback path.
  • Security: The agent selector only runs for same-repo PRs (fork PRs are excluded to prevent secret exposure).
  • Fallback chain: LLM → heuristic path-prefix mapping → full suite. If the LLM call fails for any reason, the deterministic heuristic kicks in automatically.

This PR was created by an AI agent (OpenHands) on behalf of the user.

@malhotra5 can click here to continue refining the PR


🐳 Docker images for this PR

GHCR package: https://github.com/OpenHands/agent-canvas/pkgs/container/agent-canvas

Component Value
Image ghcr.io/openhands/agent-canvas
Architectures amd64, arm64
Agent Server ghcr.io/openhands/agent-server:1.26.0-python
Automation openhands-automation==1.0.0a6
Commit 8d1976bd6bee4810375f52737fc310951b9b03bf

Pull (multi-arch manifest)

# Multi-arch manifest — Docker automatically pulls the correct architecture
docker pull ghcr.io/openhands/agent-canvas:sha-8d1976b

Run

docker run -it --rm \
  -p 8000:8000 \
  ghcr.io/openhands/agent-canvas:sha-8d1976b

All tags pushed for this build

ghcr.io/openhands/agent-canvas:sha-8d1976b-amd64
ghcr.io/openhands/agent-canvas:agent-e2e-test-selector-amd64
ghcr.io/openhands/agent-canvas:pr-1267-amd64
ghcr.io/openhands/agent-canvas:sha-8d1976b-arm64
ghcr.io/openhands/agent-canvas:agent-e2e-test-selector-arm64
ghcr.io/openhands/agent-canvas:pr-1267-arm64
ghcr.io/openhands/agent-canvas:sha-8d1976b
ghcr.io/openhands/agent-canvas:agent-e2e-test-selector
ghcr.io/openhands/agent-canvas:pr-1267

About Multi-Architecture Support

  • Each tag (e.g., sha-8d1976b) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., sha-8d1976b-amd64) are also available if needed

Add an LLM-based CI workflow that analyzes PR file changes and
selects the most relevant mock-LLM E2E test specs to run, then
triggers the E2E workflow with only that subset.

Changes:
- .github/workflows/mock-llm-e2e.yml: Add workflow_dispatch inputs
  (test_specs, test_grep, pr_number) so the workflow can be triggered
  with a filtered subset. Fully backward-compatible — pull_request
  events still run the full suite. Use EFFECTIVE_PR_NUMBER for
  comment/artifact steps to support both trigger types.
- scripts/select-e2e-tests.py: Python script using OpenHands SDK LLM
  to intelligently map changed files to relevant specs, with a
  deterministic heuristic fallback when LLM is unavailable.
- .github/workflows/agent-e2e-selector.yml: New workflow triggered by
  the 'smart-e2e' label or manual dispatch. Analyzes PR files, runs
  the selector, and dispatches mock-llm-e2e with the chosen subset.
  Posts a summary comment to the PR.

Co-authored-by: openhands <openhands@all-hands.dev>
@vercel

vercel Bot commented Jun 8, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agent-canvas Ready Ready Preview, Comment Jun 8, 2026 9:57pm

Request Review

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

PR Artifacts Notice

This PR contains a .pr/ directory with PR-specific artifacts. This directory will be automatically removed when the PR is approved.

Fork PRs require manual cleanup before merging.

@malhotra5 malhotra5 added the smart-e2e Have an agent choose which e2e tests to run label Jun 8, 2026
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

✅ Mock-LLM E2E Tests

43/43 passed

Commit: 37310ea6 · Workflow run · Test artifacts

Status Test Duration
mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 1: configure ACP agent via Settings → Agent UI 13.7s
mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 2: reload and verify ACP settings are persisted in UI 5.6s
mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 3: start ACP conversation and verify agent reply 6.2s
mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 4: resume ACP conversation from sidebar after navigating away 5.7s
mock-llm-auth-modes.spec.ts › auth mode: fresh install with runtime-injected key › reaches the onboarding modal without pre-seeded localStorage 1.3s
mock-llm-auth-modes.spec.ts › auth mode: non-public key rotation › recovers when localStorage has a stale session API key 5.4s
mock-llm-auth-modes.spec.ts › auth mode: public gate › shows the auth screen when no key is configured 1.1s
mock-llm-auth-modes.spec.ts › auth mode: public gate › rejects an incorrect key with an inline error 1.3s
mock-llm-auth-modes.spec.ts › auth mode: public gate › allows access after pasting the correct key 1.6s
mock-llm-auth-modes.spec.ts › auth mode: public gate › skips auth screen for returning user with valid stored key 709ms
mock-llm-auth-modes.spec.ts › auth mode: public gate › re-prompts when the server rotates its key (stale localStorage) 1.4s
mock-llm-automation.spec.ts › mock-LLM automation lifecycle › step 1: setup LLM profile and register automation trajectory 7.7s
mock-llm-automation.spec.ts › mock-LLM automation lifecycle › step 2: create automation and dispatch run via the UI 28.3s
mock-llm-automation.spec.ts › mock-LLM automation lifecycle › step 3: verify automation and run on the automations page 6.1s
mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 1: create an LLM profile pointing at the mock LLM server 6.2s
mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 2: activate the mock-llm profile and verify settings API 6.1s
mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 3: run a conversation with the mock LLM 6.4s
mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 4: resume conversation from sidebar after navigating away 5.7s
mock-llm-cross-connect.spec.ts › cross-connect: frontend-only → backend-only › frontend-only connects to a separate backend-only instance 15.8s
mock-llm-cross-connect.spec.ts › cross-connect: frontend-only → multiple backends › connects to two separate backends and switches between them 20.5s
mock-llm-image-upload.spec.ts › mock-LLM image upload › attaching an image embeds it as base64 in the LLM completion call 13.4s
mock-llm-model-switch.spec.ts › mock-LLM /model slash command › step 1: configure LLM, create switch-target profile, register trajectory 12.9s
mock-llm-model-switch.spec.ts › mock-LLM /model slash command › step 2: start conversation, switch profile via /model, verify switch 6.8s
mock-llm-onboarding-happy-path.spec.ts › onboarding happy path › completes the full onboarding flow and launches a conversation 4.4s
mock-llm-onboarding-regressions.spec.ts › onboarding recent regressions › keeps the modal open on backdrop click and Escape 1.3s
mock-llm-onboarding-regressions.spec.ts › onboarding recent regressions › defaults the LLM setup step to OpenAI GPT-5.5 1.6s
mock-llm-partial-stack.spec.ts › partial stack: --frontend-only › serves the frontend but returns 503 for backend routes 7.4s
mock-llm-partial-stack.spec.ts › partial stack: --backend-only › serves backend APIs but returns 503 for the frontend root 13.1s
mock-llm-partial-stack.spec.ts › partial stack: port conflict › fails with a clear error when the ingress port is occupied 122ms
mock-llm-partial-stack.spec.ts › partial stack: port conflict › starts successfully on a free port after a conflict 6.0s
mock-llm-preset-automation.spec.ts › preset automation → slash command conversation › automation card sends the correct slash command to a conversation 15.9s
mock-llm-preset-automation.spec.ts › preset automation → slash command conversation › direct slash command from home page triggers skill activation 13.3s
mock-llm-profile-management.spec.ts › active profile deletion + reconciliation › active profile is deletable and reconciliation activates another profile 8.4s
mock-llm-profile-management.spec.ts › same-model profile identity › chat header shows the correct profile when two profiles share the same model 15.0s
mock-llm-profile-management.spec.ts › litellm_proxy proxy base_url preservation › re-saving a litellm_proxy profile from Basic view preserves the proxy base_url 7.7s
mock-llm-skills.spec.ts › skill loading: project, user, and deletion › project skill in workspace/.agents/skills/ triggers on matching keyword 14.3s
mock-llm-skills.spec.ts › skill loading: project, user, and deletion › user skill in ~/.openhands/skills/ triggers on matching keyword 13.3s
mock-llm-skills.spec.ts › skill loading: project, user, and deletion › deleting a user skill removes it from subsequent conversations 13.2s
mock-llm-ui-regressions.spec.ts › UI regressions › scopes standalone styles to the agent-server-ui shell 1.3s
mock-llm-ui-regressions.spec.ts › UI regressions › renders critic results on agent messages and finish actions 1.4s
mock-llm-ui-regressions.spec.ts › UI regressions › loads older events when scrolling up 1.5s
mock-llm-ui-regressions.spec.ts › UI regressions › selected workspace persists after navigating away and returning 2.0s
mock-llm-ui-regressions.spec.ts › UI regressions › cleared sessionStorage yields empty workspace selection 1.3s

Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)

…/gpt-5.1

- mock-llm-e2e.yml: Add workflow_call trigger with matching inputs so
  the workflow can be called as a reusable workflow (no PAT needed).
- agent-e2e-selector.yml: Replace gh workflow run dispatch (which
  requires a PAT with actions:write scope) with a workflow_call job.
  The select job outputs feed directly into the run-e2e job.
- select-e2e-tests.py: Change default model from litellm_proxy/openai/
  gpt-4.1-mini to openhands/gpt-5.1.

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🛑 Mock-LLM Docker E2E Test Results

12/12 passed · ⚠️ 31 not run (process killed at 12/43)

Commit: 37310ea6 · Workflow run · Test artifacts

Status Test Duration
chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 1: configure ACP agent via Settings → Agent UI 13.8s
chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 2: reload and verify ACP settings are persisted in UI 5.6s
chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 3: start ACP conversation and verify agent reply 6.7s
chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 4: resume ACP conversation from sidebar after navigating away 5.8s
chromium › mock-llm-auth-modes.spec.ts › auth mode: fresh install with runtime-injected key › reaches the onboarding modal without pre-seeded localStorage 1.4s
chromium › mock-llm-auth-modes.spec.ts › auth mode: non-public key rotation › recovers when localStorage has a stale session API key 5.3s
chromium › mock-llm-auth-modes.spec.ts › auth mode: public gate › shows the auth screen when no key is configured 1.2s
chromium › mock-llm-auth-modes.spec.ts › auth mode: public gate › rejects an incorrect key with an inline error 1.3s
chromium › mock-llm-auth-modes.spec.ts › auth mode: public gate › allows access after pasting the correct key 4.7s
chromium › mock-llm-auth-modes.spec.ts › auth mode: public gate › skips auth screen for returning user with valid stored key 785ms
chromium › mock-llm-auth-modes.spec.ts › auth mode: public gate › re-prompts when the server rotates its key (stale localStorage) 1.5s
chromium › mock-llm-automation.spec.ts › mock-LLM automation lifecycle › step 1: setup LLM profile and register automation trajectory 7.5s

Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)

@malhotra5 malhotra5 added smart-e2e Have an agent choose which e2e tests to run and removed smart-e2e Have an agent choose which e2e tests to run labels Jun 8, 2026
github-actions Bot added a commit that referenced this pull request Jun 8, 2026
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🤖 Agent E2E Test Selector

Mode full
Files analyzed 3
Reason Heuristic could not narrow: 1 source files changed.

Selected specs:
Full suite (all specs)

Running via: Mock-LLM E2E Tests

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

⚠️ Mock-LLM E2E Tests

0/0 passed

Commit: 6368db89 · Workflow run

Status Test Duration

Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)

… point

- select-e2e-tests.py: Remove heuristic_select() and all static
  path-prefix mapping. Always use the LLM — if LLM_API_KEY is missing
  the script fails loudly instead of silently degrading.
- mock-llm-e2e.yml: Remove pull_request trigger. E2E tests no longer
  run the full suite on every PR commit. They are now invoked only
  via workflow_call (from agent-e2e-selector) or workflow_dispatch.
- agent-e2e-selector.yml: Trigger on pull_request [opened, synchronize,
  reopened] — this is now the primary entry point for PR-driven E2E.
  The LLM picks the relevant subset and the E2E workflow runs only
  those specs.

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🛑 Mock-LLM Docker E2E Test Results

3/3 passed · ⚠️ 40 not run (process killed at 3/43)

Commit: 6368db89 · Workflow run · Test artifacts

Status Test Duration
chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 1: configure ACP agent via Settings → Agent UI 14.8s
chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 2: reload and verify ACP settings are persisted in UI 5.6s
chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 3: start ACP conversation and verify agent reply 6.8s

Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🛑 Mock-LLM E2E Tests

18/18 passed · ⚠️ 25 not run (process killed at 18/43)

Commit: 6368db89 · Workflow run · Test artifacts

Status Test Duration
chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 1: configure ACP agent via Settings → Agent UI 13.3s
chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 2: reload and verify ACP settings are persisted in UI 5.5s
chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 3: start ACP conversation and verify agent reply 6.2s
chromium › mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 4: resume ACP conversation from sidebar after navigating away 5.6s
chromium › mock-llm-auth-modes.spec.ts › auth mode: fresh install with runtime-injected key › reaches the onboarding modal without pre-seeded localStorage 1.3s
chromium › mock-llm-auth-modes.spec.ts › auth mode: non-public key rotation › recovers when localStorage has a stale session API key 5.3s
chromium › mock-llm-auth-modes.spec.ts › auth mode: public gate › shows the auth screen when no key is configured 1.1s
chromium › mock-llm-auth-modes.spec.ts › auth mode: public gate › rejects an incorrect key with an inline error 1.4s
chromium › mock-llm-auth-modes.spec.ts › auth mode: public gate › allows access after pasting the correct key 1.6s
chromium › mock-llm-auth-modes.spec.ts › auth mode: public gate › skips auth screen for returning user with valid stored key 755ms
chromium › mock-llm-auth-modes.spec.ts › auth mode: public gate › re-prompts when the server rotates its key (stale localStorage) 1.4s
chromium › mock-llm-automation.spec.ts › mock-LLM automation lifecycle › step 1: setup LLM profile and register automation trajectory 7.0s
chromium › mock-llm-automation.spec.ts › mock-LLM automation lifecycle › step 2: create automation and dispatch run via the UI 30.2s
chromium › mock-llm-automation.spec.ts › mock-LLM automation lifecycle › step 3: verify automation and run on the automations page 6.0s
chromium › mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 1: create an LLM profile pointing at the mock LLM server 6.0s
chromium › mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 2: activate the mock-llm profile and verify settings API 6.0s
chromium › mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 3: run a conversation with the mock LLM 6.3s
chromium › mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 4: resume conversation from sidebar after navigating away 5.6s

Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)

- select-e2e-tests.py: Rewritten to use Agent + Conversation instead
  of raw LLM.completion(). Adds a CIVisualizer (streams every event
  to stderr for CI log visibility) and a capture_event callback
  (collects agent messages for parsing). The agent outputs a
  structured <TEST_SELECTION> block that gets parsed for the spec
  list. Empty specs now means 'skip E2E' not 'run full suite'.
- agent-e2e-selector.yml: run-e2e job now has an if-guard that skips
  mock-llm-e2e entirely when the agent returns no specs.

Co-authored-by: openhands <openhands@all-hands.dev>
github-actions Bot added a commit that referenced this pull request Jun 8, 2026
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

⚠️ Mock-LLM Docker E2E Test Results

0/0 passed

Commit: 7a8ab8af · Workflow run

Status Test Duration

Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)

The capture_event callback collects model_dump_json() from every event.
The agent's response text lives at event.llm_message.content[].text,
not at event.message or event.text. Added extract_text_from_dumps()
to properly walk the JSON structure and pull out all text fragments
before searching for the <TEST_SELECTION> tag.

Also improved the ValueError message to include the extracted text
for easier debugging if parsing still fails.

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

⚠️ Mock-LLM Docker E2E Test Results

0/0 passed

Commit: fffa8572 · Workflow run

Status Test Duration

Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)

- select-e2e-tests.py: Agent now has terminal + file_editor tools and
  runs against the repo checkout (WORKSPACE env var, defaults to cwd).
  It can read source files and test specs to make informed decisions.
  Fixed two parsing bugs: (1) only collect agent-source events (skip
  user prompt to avoid matching the template), (2) use re.findall and
  take the last match as an extra safety net. Increased visualizer
  dump to 800 chars for better CI log visibility.
- agent-e2e-selector.yml: Install openhands-tools alongside openhands-sdk
  (needed for TerminalTool/FileEditorTool). Suppress SDK banner.
  Bumped timeout to 10 min for agent tool-call iterations.

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

⚠️ Mock-LLM Docker E2E Test Results

0/0 passed

Commit: 2b1add15 · Workflow run

Status Test Duration

Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

⚠️ Mock-LLM Docker E2E Test Results

0/0 passed

Commit: 81495093 · Workflow run

Status Test Duration

Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)

The TEST_SELECTION block lands in the FinishObservation event
(source='environment'), not in agent-source events. Removed the
source filter from the callback and replaced the flat key-lookup
extractor with a recursive _collect_text() that walks the entire
JSON tree collecting every 'text' and 'message' string value.
parse_selection() already uses re.findall + last-match to avoid
matching the prompt template.

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

⚠️ Mock-LLM Docker E2E Test Results

0/0 passed

Commit: 64702ac3 · Workflow run

Status Test Duration

Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🤖 Agent E2E Test Selector

Mode llm
Files analyzed 3
Reason Only CI workflows and test-selection script changed, not app behavior.

Selected specs:

  • ``

Running via: Mock-LLM E2E Tests

Instead of wrangling agent event dumps / regex / recursive JSON
walking, the agent now writes its result to a temp JSON file.
After the conversation finishes, we just read that file.

Removed: OUTPUT_TAG, capture_event callback, _collect_text,
extract_all_text, parse_selection, re import.

The CIVisualizer still streams all events to stderr for CI log
visibility.

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

⚠️ Mock-LLM Docker E2E Test Results

0/0 passed

Commit: dee20cfd · Workflow run

Status Test Duration

Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🤖 Agent E2E Test Selector

Mode llm
Files analyzed 3
Reason Only CI workflows and E2E selection script changed; no product behavior affected.

Selected specs:

  • ``

Running via: Mock-LLM E2E Tests

Replaced the 14-entry SPEC_CATALOG dict with discover_specs() which
globs tests/e2e/mock-llm/*.spec.ts at runtime. The agent gets the
list of available spec filenames and is told to read their source to
understand what each one tests. New specs are picked up automatically
without any script changes.

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

⚠️ Mock-LLM Docker E2E Test Results

0/0 passed

Commit: d6581db6 · Workflow run

Status Test Duration

Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🤖 Agent E2E Test Selector

Mode llm
Files analyzed 3
Reason Only E2E selector workflows and selection script changed; app behavior under test specs is unaffected.

Selected specs:

  • ``

Running via: Mock-LLM E2E Tests

github-actions Bot added a commit that referenced this pull request Jun 8, 2026
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

📸 Snapshot Test Report

Warning

Snapshot comparison step crashed (timeout, OOM, or runner error) — diff results below may be incomplete or absent.
Check the CI logs for the full error output (look for the "Run snapshot comparison" step).

❌ 1 snapshot differ from the main branch baseline. Add the update-snapshots label to acknowledge intentional changes.

Category Count
🔴 Changed 1
🆕 New 0
✅ Unchanged 73
Total 74

How to resolve:

  • Unintentional diffs — the baselines on main may have moved since this branch was created. Merge the latest main into this branch and re-run CI.
  • Intentional changes — add the update-snapshots label. CI will pass and the new screenshots become the baseline when this PR merges.
🔴 Changed snapshots (1)

backends-extended

backend-dropdown-two-backends

Expected (main) Actual (PR) Diff
expected actual diff
✅ Unchanged snapshots (73)

archived-conversation

  • conversation-panel-with-archived-badges
  • conversation-view-archived
  • conversation-view-sandbox-error

automations

  • automations-delete-modal
  • automations-list-active-inactive
  • automations-no-automations
  • automations-search-no-results

backends-extended

  • backend-add-blank-disabled
  • backend-add-cloud-advanced-open
  • backend-add-cloud-no-key-disabled
  • backend-add-cloud-with-key-enabled
  • backend-add-form-partially-filled
  • backend-add-invalid-url-disabled
  • backend-add-local-ready
  • backend-add-name-only-disabled
  • backend-add-two-column-layout
  • backend-add-whitespace-host-disabled
  • backend-after-switch
  • backend-cancel-nothing-saved
  • backend-edit-prefilled
  • backend-manage-after-removal
  • backend-manage-two-listed
  • backend-remove-cancelled
  • backend-remove-confirmation
  • backend-switch-overlay

backends

  • backend-add-modal
  • backend-manage-modal
  • backend-selector-open

changes-tab

  • changes-deleted-file
  • changes-diff-viewer
  • changes-empty

collapsible-thinking

  • reasoning-content-collapsed
  • reasoning-content-expanded
  • think-action-collapsed
  • think-action-expanded

mcp-page

  • mcp-custom-server-1-editor-open
  • mcp-custom-server-2-url-filled
  • mcp-custom-server-3-all-filled
  • mcp-custom-server-4-installed
  • mcp-custom-server-editor
  • mcp-empty-installed
  • mcp-search-filtered
  • mcp-slack-install-1-marketplace
  • mcp-slack-install-2-modal
  • mcp-slack-install-3-filled
  • mcp-slack-install-4-installed

onboarding

  • onboarding-step-0-check-backend
  • onboarding-step-1-choose-agent
  • onboarding-step-2-setup-llm
  • onboarding-step-3-say-hello

projects-workspace-browser

  • projects-workspace-browser

settings-page

  • add-backend-modal
  • analytics-consent-modal
  • home-screen
  • settings-app-page
  • settings-page

settings-secrets

  • secrets-add-form-filled
  • secrets-add-form
  • secrets-after-save
  • secrets-delete-confirm
  • secrets-list

settings-verification

  • condenser-settings
  • verification-settings-critic-enabled
  • verification-settings-off
  • verification-settings-on

sidebar

  • sidebar-collapsed
  • sidebar-conversation-panel
  • sidebar-filter-menu

skills-page

  • skills-empty
  • skills-loaded
  • skills-no-match
  • skills-search-filtered
  • skills-type-filter

Generated by the Snapshot Tests workflow. This comment was created by an AI agent (OpenHands) on behalf of the repo maintainers.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🔶 Mock-LLM Docker E2E Test Results

38/43 passed · 5 skipped

Commit: 8d1976bd · Workflow run · Test artifacts

Status Test Duration
mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 1: configure ACP agent via Settings → Agent UI 15.3s
mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 2: reload and verify ACP settings are persisted in UI 5.5s
mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 3: start ACP conversation and verify agent reply 6.7s
mock-llm-acp-agent.spec.ts › mock-LLM ACP agent conversation › step 4: resume ACP conversation from sidebar after navigating away 5.7s
mock-llm-auth-modes.spec.ts › auth mode: fresh install with runtime-injected key › reaches the onboarding modal without pre-seeded localStorage 1.3s
mock-llm-auth-modes.spec.ts › auth mode: non-public key rotation › recovers when localStorage has a stale session API key 5.3s
mock-llm-auth-modes.spec.ts › auth mode: public gate › shows the auth screen when no key is configured 1.1s
mock-llm-auth-modes.spec.ts › auth mode: public gate › rejects an incorrect key with an inline error 1.4s
mock-llm-auth-modes.spec.ts › auth mode: public gate › allows access after pasting the correct key 1.7s
mock-llm-auth-modes.spec.ts › auth mode: public gate › skips auth screen for returning user with valid stored key 731ms
mock-llm-auth-modes.spec.ts › auth mode: public gate › re-prompts when the server rotates its key (stale localStorage) 1.4s
mock-llm-automation.spec.ts › mock-LLM automation lifecycle › step 1: setup LLM profile and register automation trajectory 7.3s
mock-llm-automation.spec.ts › mock-LLM automation lifecycle › step 2: create automation and dispatch run via the UI 32.4s
mock-llm-automation.spec.ts › mock-LLM automation lifecycle › step 3: verify automation and run on the automations page 6.1s
mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 1: create an LLM profile pointing at the mock LLM server 8.6s
mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 2: activate the mock-llm profile and verify settings API 6.1s
mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 3: run a conversation with the mock LLM 6.3s
mock-llm-conversation.spec.ts › mock-LLM agent-server conversation › step 4: resume conversation from sidebar after navigating away 5.7s
⏭️ mock-llm-cross-connect.spec.ts › cross-connect: frontend-only → backend-only › frontend-only connects to a separate backend-only instance 188ms
⏭️ mock-llm-cross-connect.spec.ts › cross-connect: frontend-only → multiple backends › connects to two separate backends and switches between them 182ms
mock-llm-image-upload.spec.ts › mock-LLM image upload › attaching an image embeds it as base64 in the LLM completion call 14.8s
mock-llm-model-switch.spec.ts › mock-LLM /model slash command › step 1: configure LLM, create switch-target profile, register trajectory 14.7s
mock-llm-model-switch.spec.ts › mock-LLM /model slash command › step 2: start conversation, switch profile via /model, verify switch 6.6s
mock-llm-onboarding-happy-path.spec.ts › onboarding happy path › completes the full onboarding flow and launches a conversation 3.4s
mock-llm-onboarding-regressions.spec.ts › onboarding recent regressions › keeps the modal open on backdrop click and Escape 1.3s
mock-llm-onboarding-regressions.spec.ts › onboarding recent regressions › defaults the LLM setup step to OpenAI GPT-5.5 1.6s
⏭️ mock-llm-partial-stack.spec.ts › partial stack: --frontend-only › serves the frontend but returns 503 for backend routes 187ms
mock-llm-partial-stack.spec.ts › partial stack: --backend-only › serves backend APIs but returns 503 for the frontend root 25.1s
⏭️ mock-llm-partial-stack.spec.ts › partial stack: port conflict › fails with a clear error when the ingress port is occupied 1ms
⏭️ mock-llm-partial-stack.spec.ts › partial stack: port conflict › starts successfully on a free port after a conflict 7ms
mock-llm-preset-automation.spec.ts › preset automation → slash command conversation › automation card sends the correct slash command to a conversation 16.0s
mock-llm-preset-automation.spec.ts › preset automation → slash command conversation › direct slash command from home page triggers skill activation 13.2s
mock-llm-profile-management.spec.ts › active profile deletion + reconciliation › active profile is deletable and reconciliation activates another profile 8.4s
mock-llm-profile-management.spec.ts › same-model profile identity › chat header shows the correct profile when two profiles share the same model 14.9s
mock-llm-profile-management.spec.ts › litellm_proxy proxy base_url preservation › re-saving a litellm_proxy profile from Basic view preserves the proxy base_url 7.8s
mock-llm-skills.spec.ts › skill loading: project, user, and deletion › project skill in workspace/.agents/skills/ triggers on matching keyword 16.6s
mock-llm-skills.spec.ts › skill loading: project, user, and deletion › user skill in ~/.openhands/skills/ triggers on matching keyword 13.2s
mock-llm-skills.spec.ts › skill loading: project, user, and deletion › deleting a user skill removes it from subsequent conversations 13.1s
mock-llm-ui-regressions.spec.ts › UI regressions › scopes standalone styles to the agent-server-ui shell 945ms
mock-llm-ui-regressions.spec.ts › UI regressions › renders critic results on agent messages and finish actions 1.4s
mock-llm-ui-regressions.spec.ts › UI regressions › loads older events when scrolling up 1.6s
mock-llm-ui-regressions.spec.ts › UI regressions › selected workspace persists after navigating away and returning 2.5s
mock-llm-ui-regressions.spec.ts › UI regressions › cleared sessionStorage yields empty workspace selection 1.1s

Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

smart-e2e Have an agent choose which e2e tests to run

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants