test(e2e): add mock-LLM coverage for backend recovery and ACP single-save#1266
test(e2e): add mock-LLM coverage for backend recovery and ACP single-save#1266malhotra5 wants to merge 8 commits into
Conversation
…save Add two new E2E test specs for features/fixes that shipped without end-to-end coverage: 1. mock-llm-backend-recovery.spec.ts (PR #1205): - Recovery modal renders with recoveryMode semantics (no close button, no Done button, no backdrop dismiss) - Adding a reachable backend through the recovery modal boots the app - Editing the broken backend host to a reachable one boots the app 2. mock-llm-acp-settings.spec.ts (PR #1251): - Only one Save button on Settings → Agent (no separate credentials save) - Auth status banner appears for built-in ACP providers (Codex) - Credentials section hidden for Custom preset - Single Save persists both agent spec and credential together - Save button is disabled when no changes have been made Co-authored-by: openhands <openhands@all-hands.dev>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
PR Artifacts Notice This PR contains a
|
❌ Mock-LLM E2E Tests44/51 passed · 2 failed · 5 skipped · 🆕 8 new Commit:
🔍 Failure details (2)❌ mock-llm-acp-settings.spec.ts › ACP settings: single save + auth banner › shows auth status banner in credentials section❌ mock-llm-backend-recovery.spec.ts › backend recovery flow › shows recovery modal without dismiss controls when backend is unreachablePosted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
- Backend recovery: use .not.toBe('Connected') instead of
.not.toContain('connected') — 'Disconnected' contains the substring
'connected', causing a false negative.
- ACP settings: verify credential fields render instead of the auth
status banner — the banner only appears when a local login is
detected, which the mock-LLM environment does not provide.
Co-authored-by: openhands <openhands@all-hands.dev>
🛑 Mock-LLM Docker E2E Test Results26/43 passed · 4 failed · 13 skipped · Commit:
🔍 Failure details (4)❌ chromium › mock-llm-acp-settings.spec.ts › ACP settings: single save + auth banner › shows auth status banner in credentials section❌ chromium › mock-llm-acp-settings.spec.ts › ACP settings: single save + auth banner › shows auth status banner in credentials section❌ chromium › mock-llm-backend-recovery.spec.ts › backend recovery flow › shows recovery modal without dismiss controls when backend is unreachable❌ chromium › mock-llm-backend-recovery.spec.ts › backend recovery flow › shows recovery modal without dismiss controls when backend is unreachablePosted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
❌ Mock-LLM E2E Tests49/51 passed · 1 failed · 1 skipped · 🆕 8 new Commit:
🔍 Failure details (1)❌ mock-llm-acp-settings.spec.ts › ACP settings: single save + auth banner › single Save persists agent spec and credential togetherPosted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
🔶 Mock-LLM Docker E2E Test Results46/51 passed · 5 skipped · 🆕 8 new Commit:
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
…ent_kind The combined save test was failing because the credential save may trigger a settings refetch that resets isDirty before the agent spec save fires. Restructure the test to verify what PR #1251 actually changed: the credential portion of the single-save flow persists the secret to the secrets store (GET /api/settings/secrets). The agent_kind persistence path is already covered by the existing mock-llm-acp-agent.spec.ts test. Co-authored-by: openhands <openhands@all-hands.dev>
❌ Mock-LLM E2E Tests49/51 passed · 1 failed · 1 skipped · 🆕 8 new Commit:
🔍 Failure details (1)❌ mock-llm-acp-settings.spec.ts › ACP settings: single save + auth banner › single Save persists ACP credential alongside agent specPosted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
The GET /api/settings/secrets endpoint returns { secrets: [...] },
not a plain array.
Co-authored-by: openhands <openhands@all-hands.dev>
🛑 Mock-LLM Docker E2E Test Results26/30 passed · 2 failed · 2 skipped · Commit:
🔍 Failure details (2)❌ chromium › mock-llm-acp-settings.spec.ts › ACP settings: single save + auth banner › single Save persists ACP credential alongside agent spec❌ chromium › mock-llm-acp-settings.spec.ts › ACP settings: single save + auth banner › single Save persists ACP credential alongside agent specPosted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
✅ Mock-LLM E2E Tests51/51 passed · 🆕 8 new Commit:
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
🔶 Mock-LLM Docker E2E Test Results46/51 passed · 5 skipped · 🆕 8 new Commit:
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste — Clean, well-structured e2e tests with good coverage.
[IMPROVEMENT OPPORTUNITIES] (Worth considering — not blocking)
-
[tests/e2e/mock-llm/mock-llm-backend-recovery.spec.ts, line 44] DRY:
seedBrokenBackendis copy-pasted into everytest()body. Consider moving it to a sharedtest.beforeEachat the describe level, or extracting it into a test fixture that all three scenarios share. -
[tests/e2e/mock-llm/mock-llm-acp-settings.spec.ts, line 53] Hardcoded wait:
timeout: 10_000for credential fields. Consider whether the element should be gated on an async data fetch — if so, the test might be checking behavior rather than rendering speed, and a more deterministic signal (e.g., a dedicateddata-testid) could replace the arbitrary timeout. -
[tests/e2e/mock-llm/mock-llm-backend-recovery.spec.ts, line 108] Repetitive form-fill pattern: The add-backend form fill is nearly identical in two tests. A shared helper like
addBackend(page, name, host, apiKey)would reduce duplication and make the intent clearer.
[TESTING GAPS]
- ✅ No mocks asserting only function calls — these tests exercise real UI + API paths end-to-end. Well done.
- ✅ CI is green: 51/51 passed on the latest commit.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
New test files only; no production code changes. Tests run against the mock-LLM harness and clean up viaafterAllhooks. No breaking changes to existing behavior.
VERDICT:
✅ Worth merging — Tests are well-structured and CI is green.
KEY INSIGHT:
The two new spec files cover complementary flows (ACP settings UX and backend recovery UX) with clean, deterministic assertions. The afterAll reset pattern is a good safeguard for serial test isolation.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
…outs Address improvement suggestions from PR review: 1. DRY seedBrokenBackend: moved to test.beforeEach instead of repeating in every test body. 2. Hardcoded wait on credential fields: reduced from 10s to 5s with a comment explaining credential fields render synchronously once the preset is selected. 3. Repetitive form-fill pattern: extracted fillAndSubmitBackendForm() helper (parameterised by add/edit prefix), expectAppRecovered() helper, navigateToAgentSettings() helper, selectAcpPreset() helper, and credentialFields() locator factory. Co-authored-by: openhands <openhands@all-hands.dev>
✅ Mock-LLM E2E Tests51/51 passed · 🆕 8 new Commit:
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
🔶 Mock-LLM Docker E2E Test Results46/51 passed · 5 skipped · 🆕 8 new Commit:
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review
🟡 Acceptable — Well-structured tests with a few opportunities for improvement.
Tests Overview
This PR adds E2E coverage for two separate features:
- Backend Recovery Flow (
mock-llm-backend-recovery.spec.ts) — 3 tests for the recovery modal when the active backend is unreachable - ACP Settings Single-Save (
mock-llm-acp-settings.spec.ts) — 5 tests for the consolidated save button and auth banner
[IMPROVEMENT OPPORTUNITIES]
🟡 Suggestion — Test 5 (ACP Settings): Missing ensureMockLLMProfile call
In mock-llm-acp-settings.spec.ts, the test "Save button is disabled when no changes have been made" does not call ensureMockLLMProfile(page) before navigating to settings. The other tests in the same describe block call it, suggesting it may be needed for proper test isolation. Without it, the test might pass or fail depending on prior test state.
// Line ~163: missing seed
test("Save button is disabled when no changes have been made", async ({
page,
}) => {
// Consider adding: await ensureMockLLMProfile(page);
await navigateToAgentSettings(page);
// ...
});🟡 Suggestion — afterAll in ACP Settings uses best-effort cleanup
The afterAll hook catches exceptions silently, which is reasonable for cleanup, but the comment says "best-effort" without logging. Consider adding a console.warn so test output shows when cleanup fails, which could help debug intermittent failures.
// Line ~48
} catch (e) {
// best-effort — add visibility: console.warn("Cleanup failed:", e);
}🟡 Suggestion — poll in backend-recovery test could use a shorter timeout
In mock-llm-backend-recovery.spec.ts, the .poll() call uses a 15-second timeout to check that status is not "Connected". Since the host is localhost:19999 (non-existent), the probe should fail almost immediately. A shorter timeout (5s) would make the test faster without reducing reliability.
// Line ~157: .poll({ timeout: 15_000, ... }) could be 5_000[STYLE NOTES]
🟢 No blocking style issues. The code follows established conventions in the codebase.
[TESTING GAPS]
✅ ACP Settings test 4: Good use of API verification (/api/settings/secrets) to confirm persistence.
✅ Backend Recovery tests: Using a non-existent host is a clever way to avoid process spawning. The test structure is clean.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
These are new test files that don't modify production code. The tests cover existing functionality with no breaking changes. The afterAll cleanup in ACP settings properly resets state to avoid affecting subsequent specs.
[VERDICT]
✅ Worth merging — Tests provide valuable coverage for two features that lacked E2E tests. Minor improvements suggested above, but none are blocking.
[KEY INSIGHT]
The use of a non-existent localhost port (19999) to trigger probe failure is a pragmatic approach that avoids test infrastructure complexity while still exercising the recovery flow end-to-end.
Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:
- Add a
.agents/skills/custom-codereview-guide.mdfile to your branch (or edit it if one already exists) with the/codereviewtrigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.- Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
- When your PR is merged, the guideline file goes through normal code review by repository maintainers.
Resolve with AI? Install the iterate skill in your agent and run
/iterateto automatically drive this PR through CI, review, and QA until it's merge-ready.Was this review helpful? React with 👍 or 👎 to give feedback.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
…ster poll 1. Add missing ensureMockLLMProfile() call to test 5 (Save button disabled) for consistent test isolation across all tests in the describe block. 2. Add console.warn to afterAll catch blocks so cleanup failures are visible in test output instead of being silently swallowed. 3. Reduce poll timeout from 15s to 5s for the non-existent host probe — localhost:19999 connection-refuses immediately, so 5s is plenty. Co-authored-by: openhands <openhands@all-hands.dev>
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review: test(e2e): add mock-LLM coverage for backend recovery and ACP single-save
🟡 Acceptable — Tests are functional and follow established patterns, but verbose comments reduce signal.
[IMPROVEMENT OPPORTUNITIES]
-
[mock-llm-backend-recovery.spec.ts, Lines 1-9] Verbose Header: The 9-line block comment restates the PR description verbatim. Code should be self-documenting; PR context belongs in git blame or the PR description, not the source. Consider trimming to a single line:
/** E2E tests for backend recovery flow (PR #1205). */ -
[mock-llm-backend-recovery.spec.ts, Lines 46-53] Redundant Comment: The comment "Seeds localStorage with a backend pointing at a non-existent host..." restates exactly what the function body does. Remove or make it explain why we seed a broken backend (to test recovery without spawning separate processes).
-
[mock-llm-acp-settings.spec.ts, Lines 1-16] Verbose Header: Same issue — 16 lines of PR-description paraphrase. Trim to:
/** Mock-LLM E2E tests for ACP Settings single-save + auth banner (PR #1251). */ -
[mock-llm-acp-settings.spec.ts, Lines 118-119] Unnecessary Inline Comment: "// Fill the first credential field" above
await fields.first().click()adds no value — the code already says this.
[TESTING QUALITY] ✅
- Tests use
serialmode per conventions ✓ - Proper
afterAllcleanup resets state ✓ - Real API verification in
single Save persists ACP credentialtest (/api/settings/secretscall) ✓ - Pragmatic approach: non-existent
localhost:19999trips probe failure without needing separate processes ✓ - Helper functions (
fillAndSubmitBackendForm,navigateToAgentSettings) are well-named and reusable ✓
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
Test-only changes with no production code modifications. Proper cleanup inafterAllhooks prevents cross-test contamination. The mock-LLM approach avoids external dependencies.
VERDICT:
✅ Worth merging — Core test logic is sound. The inline comment cleanup is non-blocking but would improve signal-to-noise ratio.
KEY INSIGHT:
The test coverage is solid and follows established patterns; the main improvement opportunity is removing comments that paraphrase the code rather than explain non-obvious intent.
Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:
- Add a
.agents/skills/custom-codereview-guide.mdfile to your branch (or edit it if one already exists) with the/codereviewtrigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.- Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
- When your PR is merged, the guideline file goes through normal code review by repository maintainers.
Resolve with AI? Install the iterate skill in your agent and run
/iterateto automatically drive this PR through CI, review, and QA until it's merge-ready.Was this review helpful? React with 👍 or 👎 to give feedback.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
❌ Mock-LLM E2E Tests51/52 passed · 1 failed · 🆕 8 new Commit:
🔍 Failure details (1)❌ mock-llm-acp-settings.spec.ts › ACP settings: single save + auth banner › Save button is disabled when no changes have been madePosted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
Address third review — reduce comment noise: - Trim 9-line and 16-line file headers to single-line JSDoc. - Replace redundant seedBrokenBackend docblock with intent-focused one-liner. - Remove 'Fill the first credential field' comment that restates code. - Revert ensureMockLLMProfile in test 5 — it caused a timeout after test 4 saves ACP settings (the LLM settings page can't find the profile button in that state). Test 5 only checks the save button's initial disabled state and doesn't need an LLM profile. Co-authored-by: openhands <openhands@all-hands.dev>
🛑 Mock-LLM Docker E2E Test Results36/41 passed · 5 skipped · Commit:
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
✅ Mock-LLM E2E Tests52/52 passed · 🆕 8 new Commit:
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
🔶 Mock-LLM Docker E2E Test Results46/51 passed · 5 skipped · 🆕 8 new Commit:
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
✅ Mock-LLM E2E Tests52/52 passed · 🆕 8 new Commit:
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
📸 Snapshot Test ReportWarning Snapshot comparison step crashed (timeout, OOM, or runner error) — diff results below may be incomplete or absent. ❌ 1 snapshot differ from the main branch baseline. Add the
🔴 Changed snapshots (1)
|
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
✅ Unchanged snapshots (73)
archived-conversation
- conversation-panel-with-archived-badges
- conversation-view-archived
- conversation-view-sandbox-error
automations
- automations-delete-modal
- automations-list-active-inactive
- automations-no-automations
- automations-search-no-results
backends-extended
- backend-add-blank-disabled
- backend-add-cloud-advanced-open
- backend-add-cloud-no-key-disabled
- backend-add-cloud-with-key-enabled
- backend-add-form-partially-filled
- backend-add-invalid-url-disabled
- backend-add-local-ready
- backend-add-name-only-disabled
- backend-add-two-column-layout
- backend-add-whitespace-host-disabled
- backend-after-switch
- backend-cancel-nothing-saved
- backend-edit-prefilled
- backend-manage-after-removal
- backend-manage-two-listed
- backend-remove-cancelled
- backend-remove-confirmation
- backend-switch-overlay
backends
- backend-add-modal
- backend-manage-modal
- backend-selector-open
changes-tab
- changes-deleted-file
- changes-diff-viewer
- changes-empty
collapsible-thinking
- reasoning-content-collapsed
- reasoning-content-expanded
- think-action-collapsed
- think-action-expanded
mcp-page
- mcp-custom-server-1-editor-open
- mcp-custom-server-2-url-filled
- mcp-custom-server-3-all-filled
- mcp-custom-server-4-installed
- mcp-custom-server-editor
- mcp-empty-installed
- mcp-search-filtered
- mcp-slack-install-1-marketplace
- mcp-slack-install-2-modal
- mcp-slack-install-3-filled
- mcp-slack-install-4-installed
onboarding
- onboarding-step-0-check-backend
- onboarding-step-1-choose-agent
- onboarding-step-2-setup-llm
- onboarding-step-3-say-hello
projects-workspace-browser
- projects-workspace-browser
settings-page
- add-backend-modal
- analytics-consent-modal
- home-screen
- settings-app-page
- settings-page
settings-secrets
- secrets-add-form-filled
- secrets-add-form
- secrets-after-save
- secrets-delete-confirm
- secrets-list
settings-verification
- condenser-settings
- verification-settings-critic-enabled
- verification-settings-off
- verification-settings-on
sidebar
- sidebar-collapsed
- sidebar-conversation-panel
- sidebar-filter-menu
skills-page
- skills-empty
- skills-loaded
- skills-no-match
- skills-search-filtered
- skills-type-filter
Generated by the Snapshot Tests workflow. This comment was created by an AI agent (OpenHands) on behalf of the repo maintainers.
🔶 Mock-LLM Docker E2E Test Results47/52 passed · 5 skipped · 🆕 8 new Commit:
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |



Why
Two recent PRs shipped features/bug fixes without E2E test coverage:
recoveryModetoManageBackendsModal, clickable backend rows, andretryBootstrapProbe()— with zero tests (no unit, no E2E).AcpAuthStatusBanner, and deduplicated toasts — with 50 unit tests but no E2E coverage for the multi-component interaction flows.Summary
mock-llm-backend-recovery.spec.ts— 3 tests covering the recovery modal semantics, recovery via adding a backend, and recovery via editing a broken backend.mock-llm-acp-settings.spec.ts— 5 tests covering single Save button, auth status banner, credentials section visibility by preset, combined spec+credential save, and disabled-when-clean state.Issue Number
Covers gaps identified in PRs #1205 and #1251.
How to Test
Or run the full mock-LLM suite:
Type
Notes
serialmode,seedLocalStorage,routeSessionApiKey,dismissAnalyticsModal).afterAllcleanup in the ACP settings spec resets back toagent_kind: "openhands"so alphabetically-subsequent specs are not affected.http://localhost:19999) to trip the/server_infoprobe failure, avoiding the need to spawn separate processes.This PR was created by an AI agent (OpenHands) on behalf of the user.
@malhotra5 can click here to continue refining the PR
🐳 Docker images for this PR
• GHCR package: https://github.com/OpenHands/agent-canvas/pkgs/container/agent-canvas
ghcr.io/openhands/agent-canvasghcr.io/openhands/agent-server:1.26.0-pythonopenhands-automation==1.0.0a62a5cc332d8bce092d3e7392c15f3c840fb2882f6Pull (multi-arch manifest)
# Multi-arch manifest — Docker automatically pulls the correct architecture docker pull ghcr.io/openhands/agent-canvas:sha-2a5cc33Run
All tags pushed for this build
About Multi-Architecture Support
sha-2a5cc33) is a multi-arch manifest supporting both amd64 and arm64sha-2a5cc33-amd64) are also available if needed