feat(chat): per-tool visualizers for tool calls in the conversation UI#1246
feat(chat): per-tool visualizers for tool calls in the conversation UI#1246VascoSch92 wants to merge 9 commits into
Conversation
…ool-visualizers-for-tool-calls-in-the-conversation-ui # Conflicts: # package-lock.json
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review: Per-Tool Visualizers for Tool Calls
Taste Rating
🟡 Acceptable — Solid foundation with room for minor polish.
Summary
This PR introduces a pluggable visualizer system that renders tool call cards (action/observation pairs) as React components instead of markdown. Four visualizers are implemented: bash, file-editor, search, and task. The architecture is type-safe, the fallback to markdown is preserved for unregistered tools, and tests cover the new components.
[CRITICAL ISSUES]
None identified. The architecture is sound and backward-compatible.
[IMPROVEMENT OPPORTUNITIES]
-
src/components/features/chat/tool-visualizers/primitives/diff-view.tsx, Line 120-128 React Key: Using array index as key (
key={index}) can cause rendering issues when diff rows contain identical text. The comment acknowledges this but consider using a stable composite key likerow.type + indexfor identical-text rows to remain distinguishable during updates. -
src/components/features/chat/tool-visualizers/bash/bash.tsx, Line 100-104 Simplification: The
bash.tsxaction body has three nested ternaries that could be flattened to a switch statement for readability. This is minor since the function is small.
[STYLE NOTES]
- The comment at
diff-view.tsxline 122-123 is appropriate — it explains a known tradeoff. - The
isServerFallbackSummarycomment inget-event-content.tsxis a good example of explaining non-local behavior. Keep this pattern.
[TESTING GAPS]
- No visual evidence: This is a UI change that replaces markdown rendering with React components. The PR description includes unit tests and snapshots, but no screenshots or videos demonstrating the actual visual result. For a change this visible, consider adding:
- Before/after screenshots of the chat UI
- Video walkthrough of the visualizers in action
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
This is a purely additive change that preserves existing behavior via the markdown fallback for unregistered tools. The TypeScript types provide compile-time safety, and the test suite covers the new components. No breaking changes, no new dependencies, no security implications.
VERDICT:
✅ Worth merging — Well-structured architecture with appropriate test coverage. The incremental migration approach (visualizer → markdown fallback) is a good pattern for gradual rollout.
KEY INSIGHT:
The defineVisualizer factory with TypeScript inference is the right abstraction — it enforces at compile time that visualizers handle the exact action/observation kinds they declare, eliminating a whole class of runtime errors.
This review was generated by an AI agent (OpenHands) on behalf of VascoSch92 through OpenHands Automation. View conversation
✅ Mock-LLM E2E Tests42/42 passed Commit:
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
🔶 Mock-LLM Docker E2E Test Results37/42 passed · 5 skipped Commit:
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
…l-calls-in-the-conversation-ui
|
| Status | Test | Duration |
|---|
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)
|
| Status | Test | Duration |
|---|
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)
✅ Mock-LLM E2E Tests42/42 passed Commit:
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
🔶 Mock-LLM Docker E2E Test Results37/42 passed · 5 skipped Commit:
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
…l-calls-in-the-conversation-ui
✅ Mock-LLM E2E Tests42/42 passed Commit:
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
| "tr": "Results truncated", | ||
| "uk": "Results truncated", | ||
| "zh-CN": "Results truncated", | ||
| "zh-TW": "Results truncated" |
There was a problem hiding this comment.
suggestion (blocking): We need to support multiple languages. Thank you! 🙏
🔶 Mock-LLM Docker E2E Test Results37/42 passed · 5 skipped Commit:
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
|
@OpenHands Do a /codereview on this PR. Understand the goal and implications. Post your feedback as a review in the PR. Don’t be too mean on nits. |
|
I'm on it! enyst can track my progress at all-hands.dev |
|
@VascoSch92 Please feel free to consider or have your agent consider my agent’s feedback or ignore it as you see fit. I’m just trying to get GPT-5.5 to surface tricky stuff if there is any, but sometimes it exagerates! |
enyst
left a comment
There was a problem hiding this comment.
🟡 Taste Rating: Acceptable — the visualizer registry is a reasonable incremental seam and the tests/CI coverage are solid, but I found a couple of correctness gaps in the newly added rendering/history behavior.
This review was generated by an AI agent (OpenHands) on behalf of the requester.
[CRITICAL ISSUES]
- None.
[IMPROVEMENT OPPORTUNITIES]
src/components/features/chat/tool-visualizers/file-editor/file-editor.tsx:41Correctness: the file-editor diff path uses truthiness (obs.old_content && obs.new_content), so valid edits where one side is an empty string skip the diff entirely. That includes clearing a file (new_content: "") or inserting into an empty file (old_content: ""). The fallback then either shows only the output text or nothing useful, which undercuts the purpose of this visualizer for real edit cases. Use nullish checks (!= null) instead, and add tests for empty old/new content.src/components/features/chat/tool-visualizers/file-editor/file-editor.tsx:68In-flight insert rendering: pendinginsertactions requireold_str != null, but normal insert calls do not have anold_str; they havenew_strplusinsert_line. As a result, in-flight insert actions render only the path and hide the inserted text. Render inserts fromnew_strwithout requiringold_str(for example as an insertion diff or labeled code block) and cover it in the visualizer tests.src/contexts/conversation-websocket-context.tsx:580/src/stores/model-store.ts:105State consistency: successfulSwitchLLMObservations now get hidden and replaced with inline model messages, but the agent-triggered path only records the visual message and updates the model cache. It does not persistactive_profilethe way the user-driven switch path does, so after reload the header can fall back to model-based matching and show the wrong profile when multiple profiles share a model. Also,seedSwitchesonly runs for the initial REST preload; if an older page loaded byuseLoadOlderEventscontains a successful switch, the underlying observation remains hidden and no model message is seeded for it. Please persist the profile on successful agent-triggered switches and either reseed/update switch messages after older pages are merged or derive the anchors from currentuiEventsat render time.
[TESTING GAPS]
- Add focused tests for the cases above: file-editor empty-string diffs, pending insert actions, and a model-switch observation that arrives through older-history pagination. The existing tests cover the happy path but would not catch these regressions.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟡 MEDIUM. This is frontend-only, behind existing event data, and CI/e2e are green. The risk is still medium because it changes the chat's core event rendering path and hides some events in favor of reconstructed UI state; losing history messages or hiding edit details is user-visible and can make debugging conversations harder.
VERDICT:
❌ Needs follow-up: The architecture is sound, but I would address the empty edit/insert rendering and model-switch history consistency before merging.
KEY INSIGHT:
The visualizer seam is good, but once a raw event is hidden in favor of derived UI, every history-loading path has to rebuild that derived UI or the transcript silently loses information.
Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:
- Add a
.agents/skills/custom-codereview-guide.mdfile to your branch (or edit it if one already exists) with the/codereviewtrigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.- Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
- When your PR is merged, the guideline file goes through normal code review by repository maintainers.
Resolve with AI? Install the iterate skill in your agent and run
/iterateto automatically drive this PR through CI, review, and QA until it's merge-ready.Was this review helpful? React with 👍 or 👎 to give feedback.
|
Since my last summary, no additional work or changes were made. Checklist:
|
|
| Status | Test | Duration |
|---|
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)
|
| Status | Test | Duration |
|---|
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses)
✅ Mock-LLM E2E Tests44/44 passed Commit:
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
📸 Snapshot Test ReportWarning Snapshot comparison step crashed (timeout, OOM, or runner error) — diff results below may be incomplete or absent. ✅ 1 snapshot changed — acknowledged via the
🔴 Changed snapshots (1)
|
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
✅ Unchanged snapshots (73)
archived-conversation
- conversation-panel-with-archived-badges
- conversation-view-archived
- conversation-view-sandbox-error
automations
- automations-delete-modal
- automations-list-active-inactive
- automations-no-automations
- automations-search-no-results
backends-extended
- backend-add-blank-disabled
- backend-add-cloud-advanced-open
- backend-add-cloud-no-key-disabled
- backend-add-cloud-with-key-enabled
- backend-add-form-partially-filled
- backend-add-invalid-url-disabled
- backend-add-local-ready
- backend-add-name-only-disabled
- backend-add-two-column-layout
- backend-add-whitespace-host-disabled
- backend-after-switch
- backend-cancel-nothing-saved
- backend-edit-prefilled
- backend-manage-after-removal
- backend-manage-two-listed
- backend-remove-cancelled
- backend-remove-confirmation
- backend-switch-overlay
backends
- backend-add-modal
- backend-manage-modal
- backend-selector-open
changes-tab
- changes-deleted-file
- changes-diff-viewer
- changes-empty
collapsible-thinking
- reasoning-content-collapsed
- reasoning-content-expanded
- think-action-collapsed
- think-action-expanded
mcp-page
- mcp-custom-server-1-editor-open
- mcp-custom-server-2-url-filled
- mcp-custom-server-3-all-filled
- mcp-custom-server-4-installed
- mcp-custom-server-editor
- mcp-empty-installed
- mcp-search-filtered
- mcp-slack-install-1-marketplace
- mcp-slack-install-2-modal
- mcp-slack-install-3-filled
- mcp-slack-install-4-installed
onboarding
- onboarding-step-0-check-backend
- onboarding-step-1-choose-agent
- onboarding-step-2-setup-llm
- onboarding-step-3-say-hello
projects-workspace-browser
- projects-workspace-browser
settings-page
- add-backend-modal
- analytics-consent-modal
- home-screen
- settings-app-page
- settings-page
settings-secrets
- secrets-add-form-filled
- secrets-add-form
- secrets-after-save
- secrets-delete-confirm
- secrets-list
settings-verification
- condenser-settings
- verification-settings-critic-enabled
- verification-settings-off
- verification-settings-on
sidebar
- sidebar-collapsed
- sidebar-conversation-panel
- sidebar-filter-menu
skills-page
- skills-empty
- skills-loaded
- skills-no-match
- skills-search-filtered
- skills-type-filter
Generated by the Snapshot Tests workflow. This comment was created by an AI agent (OpenHands) on behalf of the repo maintainers.
🔶 Mock-LLM Docker E2E Test Results39/44 passed · 5 skipped Commit:
Posted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
|
@OpenHands Understand the PR and help us visualize and understand how it works when an event is rendered this way: clone yourself enyst.github.io and add a new page under arch to show the flow/design and make sure you explain and show in diagrams how it works to render and event; make sure the page renders correctly. (push it out for us to see and give us a link in a github comment) Add another section to that page that answers the question: how does a user define a different way to render an event or more? If they cannot do it without editing their fork, then investigate and propose a simple way for a followup arch that will allow people to add addons that define new visualizers and plug them in without editing code. Find the open issue posted on agent-canvas by devin, where he proposes addons; understand all comments in that proposal and post a comment there where you link these and propose how to add custom visualizers. |
|
I'm on it! enyst can track my progress at all-hands.dev |
|
Done.
The page explains the current render flow with diagrams, including:
This comment was created by an AI agent (OpenHands) on behalf of the user. |



Why
Tool calls were all rendered through one generic markdown pipeline, so every
tool looked alike and tools the frontend didn't model (e.g.
taskandcanvas_ui) fell back to dumping the raw event JSON. This adds purpose-built,per-tool visualizers that render a tool call's action/observation card as React.
The system is incremental: unregistered tools keep the markdown fallback, so
visualizers can be added one tool at a time.
Summary
defineVisualizer+ dispatcher)with shared primitives (code block, diff view, key/value grid, output pane),
plus visualizers for bash/terminal, file editor, and search.
parent agent sent, and the subagent's markdown result. While the task is
in-flight (no observation yet) it shows just the query.
list under an "Invoked Skill Knowledge" header (reuses
SkillReadyContentList).canvas_uiobservations now show their acknowledgement text instead of araw JSON dump.
the decorative icons from the bash/terminal card. Success/error dots now honor
is_error/statusfor the invoke-skill, task, and canvas observations.Issue Number
Closes #570
How to Test
Manual:
npm run dev(ornpm run dev:mock), open a conversation and triggerbash/grep/file edits and a subagent
task, then confirm each tool renders adedicated card, the copy buttons work on command/output/result, and an in-flight
taskshows the query before its result arrives.Video/Screenshots
Type
Notes
TaskAction/TaskObservationandCanvasUIObservation.TaskAction's schema (prompt,subagent_type, …) was verified against the SDK source, not guessed.mainmerge; the only conflict waspackage-lock.json, resolved tothe
@openhands/extensions0.3.0 npm release (matching the mergedpackage.json).GenericEventMessageWrapperfor theinvoke-skill toggle (its parts are unit-tested individually).
🐳 Docker images for this PR
• GHCR package: https://github.com/OpenHands/agent-canvas/pkgs/container/agent-canvas
ghcr.io/openhands/agent-canvasghcr.io/openhands/agent-server:1.26.0-pythonopenhands-automation==1.0.0a67e81981c42b2d04c18fd618278acb874dfd8a534Pull (multi-arch manifest)
# Multi-arch manifest — Docker automatically pulls the correct architecture docker pull ghcr.io/openhands/agent-canvas:sha-7e81981Run
All tags pushed for this build
About Multi-Architecture Support
sha-7e81981) is a multi-arch manifest supporting both amd64 and arm64sha-7e81981-amd64) are also available if needed