Skip to content

feat: slash commands and skills (/help, /clear, /skills, /<skill>) with autocomplete + reasoning UX#235

Open
illeatmyhat wants to merge 4 commits into
mainfrom
slash-commands-backend
Open

feat: slash commands and skills (/help, /clear, /skills, /<skill>) with autocomplete + reasoning UX#235
illeatmyhat wants to merge 4 commits into
mainfrom
slash-commands-backend

Conversation

@illeatmyhat

@illeatmyhat illeatmyhat commented May 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds first-class slash-command support to CUGA: three built-ins (/help, /clear, /skills), invocation of repo-defined skills (e.g. /echo), an autocomplete dropdown, embedding-based suggestions for typos, and an allowed-tools whitelist enforced at the HITL approval gate. The skill invocation surfaces as a "Skill invoked: /" step inside the assistant message's reasoning panel — toggle relabelled "Show details" — instead of a separate chip bubble.

For human reviewers

Tell your agent to read this PR and execute the instructions in the "For your agent reviewer" section below.

Screenshots

Autocomplete dropdown on / /help
01-dropdown 02-help
/skills /echo hello world — reasoning panel expanded
03-skills 04-echo
Unknown /sumarize → suggestion chips /clear
06-suggestions 07-clear

What's in this PR

  • Backendsrc/cuga/backend/slash_commands/ (parser, registry, dispatcher, message synthesis, $ARGUMENTS substitution, command resolver, allowed-tools enforcement, three built-ins). Stream events SlashSkillInvoked, SlashSuggestions, ThreadIdChanged. ChatHistoryMessage discriminated-union rehydration so AIMessage.tool_calls / ToolMessage.tool_call_id survive LangGraph checkpoints. SSE envelope now always wraps with event: + data: — fixes a bug where non-Answer events under OutputFormat.DEFAULT leaked raw JSON glued to the next event.
  • Frontend — Carbon AI Chat composer hardened against contenteditable + shadow-DOM + post-submit node-replacement (composerTextarea.ts). Autocomplete dropdown (SlashCommandDropdown.tsx) reopens after submit and yields Enter when args are typed. Suggestion chips (SlashChips.tsx). Skill invocation rendered as a reasoning step (customSendMessage.ts + customLoadHistory.ts). Carbon's reasoning toggle relabelled to "Show details" via MutationObserver in CarbonChat.tsx.
  • Tests — Python unit suites for stream envelope, ChatHistoryMessage rehydration, every slash module, and the Langfuse span contract. Playwright e2e (tests/e2e/slash-chips.spec.ts) for both chip types, history replay, /clear rotation, and the "Show details" label.

For your agent reviewer

You're being asked to verify that the slash-command pipeline works end-to-end on the user's machine. Do not trust the unit tests alone — the failure modes here are visual (the chip leaking to its own bubble, the dropdown not reopening, the SSE wire format leaking into the chat bubble), and they only show up when a real browser drives a real server.

0. Environment prereqs

  1. LiteLLM endpoint reachable. CUGA's planner uses LiteLLM (Azure/gpt-4.1 by default). If you're behind a corporate VPN that fronts LiteLLM, connect it before doing anything else — without VPN, calls hang silently rather than failing. Confirm with curl -sf $OPENAI_BASE_URL/health || echo "VPN?" if you have a health route, or just try a one-shot agent.invoke("hello") and watch for a response within ~20s.
  2. uv installed. Run uv sync once.
  3. Frontend bundle built. Run bash src/frontend_workspaces/frontend/build.shpnpm run build alone does not deploy. The script does pnpm run build and copies dist/ into src/cuga/frontend/dist/, which is what the FastAPI server actually serves.
  4. Echo skill on disk. Create .agents/skills/echo/SKILL.md (the .agents/ directory is gitignored, so it won't be in your checkout) with the following:
    ---
    name: echo
    description: Echoes its arguments back verbatim. A minimal skill for verifying the slash-command pipeline end-to-end.
    ---
    
    You are an echo. Respond with the user's arguments verbatim — exactly as they wrote them, nothing else.
    
    Do not add a confirmation. Do not add a preamble. Do not add a timestamp. Do not quote the args. Do not call any tools. Do not plan further steps.
    
    If the user passed no arguments, respond with the single word: `(empty)`.
  5. Server up. .venv/bin/cuga start demo_skills — wait until curl -sf http://127.0.0.1:7860/api/commands returns 200.

1. Wire format sanity check (curl)

curl -sN -X POST http://127.0.0.1:7860/stream \
  -H "Content-Type: application/json" \
  -H "X-Thread-ID: $(uuidgen)" \
  -d '{"query":"/echo hello world"}' | head -40

Expected: each event is one event: <Name>\n followed by one data: ...\n followed by a blank line. If you see data: event: or data: data: anywhere in the output, the SSE envelope is being double-wrapped (regression of main.py:1771).

2. Playwright live suite

Mirror the patterns in src/frontend_workspaces/frontend/tests/e2e/slash-chips.spec.ts (stubbed) but point the new spec at the live server. Critical patterns:

  • Carbon AI Chat 1.6 ships the composer as a contenteditable host with role="textbox", not a <textarea>. Read with textContent, set with the patterns in composerTextarea.ts.
  • Multiple composers can co-exist in the shadow DOM after a submit; prefer the visible one (getBoundingClientRect().width > 0).
  • Carbon's submit clears the composer; the autocomplete dropdown's MutationObserver must re-resolve when the live composer node changes identity. Test this by submitting /skills then typing / again and asserting the dropdown reopens.
  • The slash dropdown intercepts Enter to "accept" the highlighted command. When the user has already typed args (^/<name>\s+), Enter should fall through to Carbon's submit. Test /echo hello world with a single Enter press and assert the outbound /stream payload contains "hello world".

The minimum acceptance suite:

Scenario Assertion
Dropdown opens on / .cuga-slash-dropdown visible, lists /clear, /help, /skills, /echo
/help Response text contains /help, /clear, /skills
/skills Response text contains "Installed skills" + "echo" + the skill description
/clear X-Thread-ID on the next /stream request differs from the prior one (capture via page.on("response", ...))
/echo hello world (single Enter) (a) no .cuga-slash-skill-chip element anywhere; (b) at least one assistant bubble equals exactly "hello world" (filter the user message out first); (c) the rendered text does not contain the raw SSE wire format event: or data:
/sumarize .cuga-slash-suggestions visible with at least one chip; no bubble text matches /Unknown command:/ (the suppression contract — chip's own header text is Unknown command <name>. Did you mean: without colon, so the colon-anchored regex distinguishes them)
Dropdown re-opens after /skills Submit /skills, wait for response, type /, assert dropdown visible again

Sample spec scaffolding lives in /tmp/full-suite.spec.ts, /tmp/dropdown-reopen.spec.ts, and /tmp/echo-args.spec.ts on the originating dev machine — recreate them in a /tmp/ directory of your own. Use pnpm exec playwright test --config <your-config>.ts --reporter=line from src/frontend_workspaces/frontend/.

3. Known pre-existing failures (not regressions)

Both fail identically on main; do not chase:

  • tests/unit/test_skill_loader.py::test_skill_registry_load_skill_emits_install_steps_for_requirements
  • tests/unit/test_find_tools_exception.py::test_find_tools_composes_query_with_initial_user_message

4. Report

Run the live suite, paste the pass/fail tally, and flag any of the following that you observe — these were the bugs the development session caught and would be the most informative regressions:

  • Dropdown does not reopen after submit
  • /echo hello world arrives at the server with empty raw_args
  • /clear does not rotate the thread id
  • Suggestion chips never render for an unknown command
  • Skill invocation appears as a separate ⚡ /<name> chip bubble (should be a reasoning step)
  • "Show Reasoning" appears instead of "Show details"
  • Raw event: … / data: … text appears inside any assistant bubble

Test plan

  • Python unit suites green (uv run pytest tests/unit/)
  • Frontend e2e stubbed suite green (pnpm exec playwright test tests/e2e)
  • Live-server Playwright suite green (9/9 against cuga start demo_skills)
  • Curl-confirmed SSE envelope shape
  • Reviewer's agent re-ran the live suite on their machine

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Full slash-command subsystem: parsing/dispatch, built-ins (/help, /clear, /skills), skill loading with argument substitution, allowed-tools whitelist/enforcement, embedding-based suggestions, GET /api/commands, SDK + frontend autocomplete, suggestion chips, and thread-id rotation handling
    • Frontend: composer-aware slash autocomplete dropdown, clickable suggestion chips, skill-invocation rendering, CSS for slash UI, and exported setThreadId()
  • Bug Fixes

    • Robust SSE parsing/formatting to preserve multi-line event payloads
    • Improved chat-history typing/rehydration and streaming stability
  • Tests

    • Playwright e2e suite and extensive unit tests for parser, substitution, enforcement, resolver, dispatcher, message synthesis, and stream formatting

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 17, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an end-to-end slash-command system: parser, types/registry, dispatch with builtin/skill/unknown flows, argument substitution, allowed-tools enforcement and approval, message synthesis, server/SDK wiring, frontend autocomplete/chips and SSE handling, plus comprehensive unit and Playwright e2e tests.

Changes

Slash Command System

Layer / File(s) Summary
Chat Message Typing & State Foundation
src/cuga/backend/cuga_graph/state/agent_state.py, src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
Adds ChatHistoryMessage discriminated union; updates AgentState and CugaLite state to preserve subclass fields (tool_calls/tool_call_id) and reserialize history into model payloads.
SSE & AgentLoop
src/cuga/backend/cuga_graph/utils/agent_loop.py
Rewrite StreamEvent.parse/format to preserve multi-line SSE data: lines and always emit SSE blocks; add AgentLoop.skill_allowed_tools plumbing injected into config.
Tool Approval Whitelist Gate
src/cuga/backend/cuga_graph/nodes/cuga_lite/tool_approval_handler.py, src/cuga/backend/slash_commands/allowed_tools_enforcement.py
Add AST-based bare-call detection and find_disallowed_calls; implement _check_skill_allowed_tools whitelist gate that creates approval interrupts with metadata when disallowed calls are found.
Skill Loader & Registry
src/cuga/backend/skills/loader.py, src/cuga/backend/skills/registry.py, src/cuga/backend/skills/tools.py
Normalize arguments and allowed-tools frontmatter, extend SkillEntry with arguments/allowed_tools, add SkillRegistry.entries()/entry(), and make load_skill(name, args="") with StructuredTool support.
Argument Substitution Engine
src/cuga/backend/slash_commands/arg_substitution.py
Implement validate_arg_names, split_args, and substitute with precedence for escaped dollars, indexed/bare/named/positional placeholders and fallback append-if-no-placeholder.
Message Synthesis
src/cuga/backend/slash_commands/message_synthesis.py
Add new_tool_call_id() and synthesize_skill_invocation() to produce injected Human→AI(tool call)→Tool→[Human] message sequences for skill invocations.
Parser, Types & Registry
src/cuga/backend/slash_commands/parser.py, src/cuga/backend/slash_commands/types.py, src/cuga/backend/slash_commands/registry.py, src/cuga/backend/slash_commands/__init__.py, src/cuga/backend/slash_commands/builtins/*
Add leading-slash parser, dispatch types, unified SlashRegistry merging builtins+skills with shadow suppression, builtin discovery, and builtins /help, /clear, /skills.
Dispatch & Resolver
src/cuga/backend/slash_commands/dispatcher.py, src/cuga/backend/slash_commands/command_resolver.py
Add parse_and_dispatch() entrypoint with builtin/skill/unknown routing, optional embedding-backed CommandResolver with size-1 cache and lock, suggestion enrichment, and best-effort telemetry including Langfuse spans.
Server event stream & API
src/cuga/backend/server/main.py
Prefetch embedding function at startup; route stream inputs through slash dispatch; emit SlashSuggestions/SlashSkillInvoked/ThreadIdChanged, inject synthesized messages for skills, persist rotated-thread history, wire skill_allowed_tools into AgentLoop, harden SSE parsing, and add GET /api/commands.
Python SDK integration
src/cuga/sdk.py
Add _dispatch_slash() and invoke-time slash dispatch: builtin/unknown short-circuit returns, use injected_messages for skills, propagate allowed_tools into run_config, and make InvokeResult.thread_id optional.
Frontend composer & utilities
src/frontend_workspaces/frontend/src/carbon-chat/composerTextarea.ts
Add shadow-DOM-aware utilities to discover composer input, read/write across textarea/input/contenteditable, detect staleness, and manage ARIA combobox attributes.
Frontend UI: Dropdown, Chips, SSE, History
src/frontend_workspaces/frontend/src/carbon-chat/SlashCommandDropdown.tsx, src/frontend_workspaces/frontend/src/carbon-chat/SlashChips.tsx, src/frontend_workspaces/frontend/src/carbon-chat/CarbonChat.tsx, src/frontend_workspaces/frontend/src/carbon-chat/customSendMessage.ts, src/frontend_workspaces/frontend/src/carbon-chat/customLoadHistory.ts, src/frontend_workspaces/frontend/src/api.ts, CSS, Playwright config, tests
Add SlashCommandDropdown with debounced fetch, keyboard navigation and ARIA; SlashSuggestions chip renderer; CarbonChat wiring (setThreadId, portal wrapper, toggle relabel); SSE multi-line parsing and Slash* event handling; history replay semantics and Playwright e2e tests with a static server.
Tests & CI script
tests/unit/*, src/frontend_workspaces/frontend/tests/e2e/*, src/scripts/run_tests.sh
Extensive unit tests for parser, resolver, substitution, allowed-tools enforcement, message synthesis, Langfuse telemetry, SSE formatting, agent-state rehydration, builtins; Playwright e2e suite and static server; test script updated to run slash tests.
Misc
.gitignore, package.json changes
Ignore Playwright artifacts; add Playwright devDependency and test:e2e script in frontend package; clear top-level frontend workspace devDependencies; add static server for e2e.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

priority: critical

Suggested reviewers

  • sami-marreed
  • offerakrabi

"A rabbit hops through slashes bright,
With /help, /clear, commands in sight!
Skills dance in args—$1, $name
Suggestions glow when typos came.
From chat to code, a unified thread,
The command realm now widely spread! 🐇"

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch slash-commands-backend

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
src/cuga/backend/server/main.py (1)

3342-3344: 💤 Low value

Use raise ... from None to suppress exception chaining in API response.

When re-raising as HTTPException, the original exception chain is preserved in logs but shouldn't leak to clients. Using from None makes the intent explicit.

Proposed fix
     except Exception:
         logger.exception("Failed to build slash command registry")
-        raise HTTPException(status_code=500, detail="Failed to build slash command registry")
+        raise HTTPException(status_code=500, detail="Failed to build slash command registry") from None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/server/main.py` around lines 3342 - 3344, The except block
that catches exceptions during building the slash command registry currently
re-raises an HTTPException while preserving the original exception chain; update
the re-raise to suppress exception chaining by using "raise
HTTPException(status_code=500, detail='Failed to build slash command registry')
from None" in the except handler where logger.exception("Failed to build slash
command registry") is called so the original traceback is not attached to the
API response.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 2055-2063: Tool approval interrupts are being created
unconditionally via ToolApprovalHandler.check_and_create_approval_interrupt
which can dead-end because the resumption/denial paths are still gated by
settings.policy.enabled; only create the approval interrupt when policy handling
is enabled (i.e., wrap the call to
ToolApprovalHandler.check_and_create_approval_interrupt in a check of
settings.policy.enabled) or alternatively make the approval resumption/denial
handling unconditional—modify the code around the approval creation so it only
invokes ToolApprovalHandler.check_and_create_approval_interrupt when
settings.policy.enabled is true (or update the resume/deny handlers to remove
the settings.policy.enabled guard) to ensure created interrupts can be
resumed/denied.
- Around line 1931-1940: The ToolMessage branch currently coerces missing
tool_call_id to an empty string which some providers reject; update the branch
handling ToolMessage (involving symbols ToolMessage, modified_chat_messages,
messages_for_model) to always append the original msg to modified_chat_messages
but only append a dict to messages_for_model if msg.tool_call_id is present and
not empty (and treat sentinel like "unknown" as invalid), e.g., check
getattr(msg, "tool_call_id", None) and skip adding to messages_for_model when
falsy or equals "unknown"; this ensures invalid tool entries are not sent to the
model while preserving the original message history.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/tool_approval_handler.py`:
- Around line 275-277: The except block currently swallows errors and returns
None, which silently allows tools; instead, on exception invoke the existing
interrupt/approval path so failures require approval. Replace the bare except in
tool_approval_handler (the try/except that logs "skill allowed-tools enforcement
failed; skipping whitelist check") to call the same interrupt creation routine
used elsewhere in this module (the function/path that constructs and returns an
approval interrupt when a tool is disallowed) and return that interrupt (or
raise it) rather than returning None; keep the logger.exception call but remove
the auto-allow return so enforcement errors trigger the approval flow.

In `@src/cuga/backend/slash_commands/command_resolver.py`:
- Around line 20-44: The module currently imports numpy and uses it only in the
_cosine function; change this to avoid the implicit dependency by replacing the
numpy-based implementation in _cosine with a pure-Python version that computes
dot product, norms (using math.sqrt and sum(x*x)), and returns 0.0 for zero-norm
inputs, then remove the top-level "import numpy as np"; alternatively, if you
prefer to keep NumPy, add numpy to project dependencies in pyproject.toml—make
the change around the _cosine function and the module import so callers of
CommandSuggestion remain unaffected.

In `@src/cuga/backend/slash_commands/dispatcher.py`:
- Around line 84-86: The cache key generator _registry_key currently hashes only
(c.name, c.kind) from slash_registry.list_commands(), so update it to include
each command's description as well (e.g., use (c.name, c.kind, c.description or
"") for each command) before sorting and hashing; ensure you normalize None
descriptions to an empty string so edits to a command's description will change
the key and invalidate the resolver cache.

In `@src/frontend_workspaces/frontend/tests/e2e/README.md`:
- Line 27: The code span in the README (`/<name> `) contains a trailing space
that violates markdownlint MD038; edit the README line with the chips example
and remove the trailing space inside the backticks so the snippet reads
`/<name>` (locate the occurrence of the literal backticked string `/<name> ` and
update it).

In `@src/frontend_workspaces/frontend/tests/e2e/static-server.cjs`:
- Around line 37-44: In resolveFile, replace the brittle startsWith(ROOT) check
with canonical path resolution and a boundary check: compute the resolved
candidate via path.resolve(ROOT, p) (or resolve both ROOT and candidate), then
use path.relative(resolvedRoot, resolvedCandidate) to ensure it does not start
with '..' (or equal '..') before serving the file; if the relative path
indicates traversal or the resolved candidate is not a file, fall back to
index.html. This ensures sibling-prefix paths like "dist-evil" cannot bypass the
guard while keeping existing decode/exists checks (refer to resolveFile,
candidate, ROOT).

In `@tests/unit/test_slash_command_resolver.py`:
- Around line 53-68: Rename the parameter named id to item_id in the stub store
methods add, get, and delete to avoid shadowing the built-in; update each method
signature (add(self, item_id: str, ...), get(self, item_id: str), delete(self,
item_id: str)) and change all internal references from id to item_id (e.g.,
self.rows[item_id], self.rows.pop(item_id, None)); also update any test code or
callers that invoke these methods to use the new parameter name where they pass
or reference the identifier.

---

Nitpick comments:
In `@src/cuga/backend/server/main.py`:
- Around line 3342-3344: The except block that catches exceptions during
building the slash command registry currently re-raises an HTTPException while
preserving the original exception chain; update the re-raise to suppress
exception chaining by using "raise HTTPException(status_code=500, detail='Failed
to build slash command registry') from None" in the except handler where
logger.exception("Failed to build slash command registry") is called so the
original traceback is not attached to the API response.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f41c0b2f-223b-4316-81bd-1120c3beb1c0

📥 Commits

Reviewing files that changed from the base of the PR and between 5ef97b1 and 8105ffe.

⛔ Files ignored due to path filters (4)
  • src/cuga/frontend/dist/index.html is excluded by !**/dist/**
  • src/cuga/frontend/dist/main.49fce4fa506d84bf86b2.js is excluded by !**/dist/**
  • src/cuga/frontend/dist/main.76f3bae552155a4125f1.js is excluded by !**/dist/**
  • src/frontend_workspaces/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (50)
  • .gitignore
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/tool_approval_handler.py
  • src/cuga/backend/cuga_graph/state/agent_state.py
  • src/cuga/backend/cuga_graph/utils/agent_loop.py
  • src/cuga/backend/server/main.py
  • src/cuga/backend/skills/loader.py
  • src/cuga/backend/skills/registry.py
  • src/cuga/backend/skills/tools.py
  • src/cuga/backend/slash_commands/__init__.py
  • src/cuga/backend/slash_commands/allowed_tools_enforcement.py
  • src/cuga/backend/slash_commands/arg_substitution.py
  • src/cuga/backend/slash_commands/builtins/__init__.py
  • src/cuga/backend/slash_commands/builtins/clear.py
  • src/cuga/backend/slash_commands/builtins/help.py
  • src/cuga/backend/slash_commands/builtins/skills.py
  • src/cuga/backend/slash_commands/command_resolver.py
  • src/cuga/backend/slash_commands/dispatcher.py
  • src/cuga/backend/slash_commands/message_synthesis.py
  • src/cuga/backend/slash_commands/parser.py
  • src/cuga/backend/slash_commands/registry.py
  • src/cuga/backend/slash_commands/types.py
  • src/cuga/sdk.py
  • src/frontend_workspaces/frontend/package.json
  • src/frontend_workspaces/frontend/playwright.config.ts
  • src/frontend_workspaces/frontend/src/api.ts
  • src/frontend_workspaces/frontend/src/carbon-chat/CarbonChat.css
  • src/frontend_workspaces/frontend/src/carbon-chat/CarbonChat.tsx
  • src/frontend_workspaces/frontend/src/carbon-chat/SlashChips.tsx
  • src/frontend_workspaces/frontend/src/carbon-chat/SlashCommandDropdown.tsx
  • src/frontend_workspaces/frontend/src/carbon-chat/composerTextarea.ts
  • src/frontend_workspaces/frontend/src/carbon-chat/customLoadHistory.ts
  • src/frontend_workspaces/frontend/src/carbon-chat/customSendMessage.ts
  • src/frontend_workspaces/frontend/tests/e2e/README.md
  • src/frontend_workspaces/frontend/tests/e2e/slash-chips.spec.ts
  • src/frontend_workspaces/frontend/tests/e2e/static-server.cjs
  • src/frontend_workspaces/package.json
  • src/scripts/run_tests.sh
  • tests/unit/test_agent_state_message_rehydration.py
  • tests/unit/test_skill_loader.py
  • tests/unit/test_slash_allowed_tools_enforcement.py
  • tests/unit/test_slash_arg_substitution.py
  • tests/unit/test_slash_builtin_discovery.py
  • tests/unit/test_slash_clear_builtin.py
  • tests/unit/test_slash_command_resolver.py
  • tests/unit/test_slash_langfuse_span.py
  • tests/unit/test_slash_message_synthesis.py
  • tests/unit/test_slash_parser.py
  • tests/unit/test_slash_skills_builtin.py
  • tests/unit/test_stream_event_format.py

Comment thread src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
Comment thread src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
Comment thread src/cuga/backend/slash_commands/command_resolver.py Outdated
Comment thread src/cuga/backend/slash_commands/dispatcher.py
Comment thread src/frontend_workspaces/frontend/tests/e2e/README.md Outdated
Comment thread src/frontend_workspaces/frontend/tests/e2e/static-server.cjs
Comment thread tests/unit/test_slash_command_resolver.py
illeatmyhat added a commit that referenced this pull request May 17, 2026
- dispatcher.py: include description in _registry_key so the size-1
  resolver cache invalidates when a skill's SKILL.md description is
  edited (suggestion ranking is embedding-based on description text).
  Caught by CodeRabbit on PR #235.
- e2e/README.md: drop trailing space inside backticks (MD038).
@illeatmyhat

Copy link
Copy Markdown
Collaborator Author

Closing to force fresh pull_request workflow triggers — will reopen immediately. The push of 0e93727 didn't re-trigger Tests/Stability/Smoke because GitHub dedup'd by check_run name. Workflow_dispatch runs proved secrets and tests both work on this SHA; this round will populate the PR check rollup.

@illeatmyhat illeatmyhat reopened this May 17, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/cuga/backend/slash_commands/dispatcher.py (1)

84-85: ⚡ Quick win

Use a collision-free cache key for resolver invalidation.

_registry_key currently compresses command identity into hash(...); while rare, collisions can incorrectly reuse a stale resolver. Keep the normalized sorted tuple as the cache key instead of hashing it.

Suggested change
-_resolver_cache: Optional[Tuple[int, CommandResolver]] = None
+_resolver_cache: Optional[Tuple[tuple[tuple[str, str, str], ...], CommandResolver]] = None
@@
-def _registry_key(slash_registry: SlashRegistry) -> int:
-    return hash(tuple(sorted((c.name, c.kind, c.description or "") for c in slash_registry.list_commands())))
+def _registry_key(slash_registry: SlashRegistry) -> tuple[tuple[str, str, str], ...]:
+    return tuple(
+        sorted((c.name, c.kind, c.description or "") for c in slash_registry.list_commands())
+    )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/slash_commands/dispatcher.py` around lines 84 - 85, The
_registry_key function should return a collision-free cache key by returning the
normalized sorted tuple of command identities instead of a hashed integer:
replace the current hash(...) return in _registry_key(slash_registry:
SlashRegistry) with the actual tuple constructed from sorted((c.name, c.kind,
c.description or "") for c in slash_registry.list_commands()), ensuring you
preserve the normalization/sorting used today so the key uniquely represents the
registry for resolver invalidation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/cuga/backend/slash_commands/dispatcher.py`:
- Around line 84-85: The _registry_key function should return a collision-free
cache key by returning the normalized sorted tuple of command identities instead
of a hashed integer: replace the current hash(...) return in
_registry_key(slash_registry: SlashRegistry) with the actual tuple constructed
from sorted((c.name, c.kind, c.description or "") for c in
slash_registry.list_commands()), ensuring you preserve the normalization/sorting
used today so the key uniquely represents the registry for resolver
invalidation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ac70bd8b-b035-4809-9c7d-fda561076fab

📥 Commits

Reviewing files that changed from the base of the PR and between 8105ffe and 0e93727.

📒 Files selected for processing (2)
  • src/cuga/backend/slash_commands/dispatcher.py
  • src/frontend_workspaces/frontend/tests/e2e/README.md
✅ Files skipped from review due to trivial changes (1)
  • src/frontend_workspaces/frontend/tests/e2e/README.md

@illeatmyhat illeatmyhat force-pushed the slash-commands-backend branch from 0e93727 to 91d8489 Compare May 17, 2026 19:34

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cuga/sdk.py`:
- Around line 1688-1693: The except block that swallows exceptions during slash
dispatch must return a deterministic slash-error result instead of None so
invoke() won’t treat a slash-like input as planner text; in the except handler
surrounding the slash dispatch (the block that computes command_name from
message and calls logger.exception), detect whether
message.lstrip().startswith("/") and return a specific failure sentinel or error
response that denotes a slash-dispatch error (so downstream invoke() can enforce
slash-specific gating like allowed_tools) rather than returning None; keep the
existing logger.exception call and only change the return path for the slash
case so non-slash messages can continue to return None as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7d4dce9b-cfc4-4214-95d9-4e6e8d05f85a

📥 Commits

Reviewing files that changed from the base of the PR and between 0e93727 and 91d8489.

⛔ Files ignored due to path filters (3)
  • src/cuga/frontend/dist/index.html is excluded by !**/dist/**
  • src/cuga/frontend/dist/main.0954ca8b7ceb2f820c77.js is excluded by !**/dist/**
  • src/frontend_workspaces/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (50)
  • .gitignore
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/tool_approval_handler.py
  • src/cuga/backend/cuga_graph/state/agent_state.py
  • src/cuga/backend/cuga_graph/utils/agent_loop.py
  • src/cuga/backend/server/main.py
  • src/cuga/backend/skills/loader.py
  • src/cuga/backend/skills/registry.py
  • src/cuga/backend/skills/tools.py
  • src/cuga/backend/slash_commands/__init__.py
  • src/cuga/backend/slash_commands/allowed_tools_enforcement.py
  • src/cuga/backend/slash_commands/arg_substitution.py
  • src/cuga/backend/slash_commands/builtins/__init__.py
  • src/cuga/backend/slash_commands/builtins/clear.py
  • src/cuga/backend/slash_commands/builtins/help.py
  • src/cuga/backend/slash_commands/builtins/skills.py
  • src/cuga/backend/slash_commands/command_resolver.py
  • src/cuga/backend/slash_commands/dispatcher.py
  • src/cuga/backend/slash_commands/message_synthesis.py
  • src/cuga/backend/slash_commands/parser.py
  • src/cuga/backend/slash_commands/registry.py
  • src/cuga/backend/slash_commands/types.py
  • src/cuga/sdk.py
  • src/frontend_workspaces/frontend/package.json
  • src/frontend_workspaces/frontend/playwright.config.ts
  • src/frontend_workspaces/frontend/src/api.ts
  • src/frontend_workspaces/frontend/src/carbon-chat/CarbonChat.css
  • src/frontend_workspaces/frontend/src/carbon-chat/CarbonChat.tsx
  • src/frontend_workspaces/frontend/src/carbon-chat/SlashChips.tsx
  • src/frontend_workspaces/frontend/src/carbon-chat/SlashCommandDropdown.tsx
  • src/frontend_workspaces/frontend/src/carbon-chat/composerTextarea.ts
  • src/frontend_workspaces/frontend/src/carbon-chat/customLoadHistory.ts
  • src/frontend_workspaces/frontend/src/carbon-chat/customSendMessage.ts
  • src/frontend_workspaces/frontend/tests/e2e/README.md
  • src/frontend_workspaces/frontend/tests/e2e/slash-chips.spec.ts
  • src/frontend_workspaces/frontend/tests/e2e/static-server.cjs
  • src/frontend_workspaces/package.json
  • src/scripts/run_tests.sh
  • tests/unit/test_agent_state_message_rehydration.py
  • tests/unit/test_skill_loader.py
  • tests/unit/test_slash_allowed_tools_enforcement.py
  • tests/unit/test_slash_arg_substitution.py
  • tests/unit/test_slash_builtin_discovery.py
  • tests/unit/test_slash_clear_builtin.py
  • tests/unit/test_slash_command_resolver.py
  • tests/unit/test_slash_langfuse_span.py
  • tests/unit/test_slash_message_synthesis.py
  • tests/unit/test_slash_parser.py
  • tests/unit/test_slash_skills_builtin.py
  • tests/unit/test_stream_event_format.py
✅ Files skipped from review due to trivial changes (2)
  • src/frontend_workspaces/frontend/tests/e2e/README.md
  • .gitignore
🚧 Files skipped from review as they are similar to previous changes (35)
  • src/frontend_workspaces/package.json
  • src/frontend_workspaces/frontend/src/api.ts
  • src/cuga/backend/slash_commands/builtins/skills.py
  • src/cuga/backend/slash_commands/builtins/clear.py
  • tests/unit/test_agent_state_message_rehydration.py
  • src/frontend_workspaces/frontend/src/carbon-chat/CarbonChat.css
  • src/frontend_workspaces/frontend/playwright.config.ts
  • src/cuga/backend/cuga_graph/utils/agent_loop.py
  • src/cuga/backend/slash_commands/allowed_tools_enforcement.py
  • src/frontend_workspaces/frontend/tests/e2e/static-server.cjs
  • src/cuga/backend/slash_commands/builtins/help.py
  • src/cuga/backend/slash_commands/parser.py
  • src/cuga/backend/slash_commands/types.py
  • tests/unit/test_slash_skills_builtin.py
  • src/cuga/backend/slash_commands/init.py
  • src/frontend_workspaces/frontend/src/carbon-chat/composerTextarea.ts
  • src/cuga/backend/cuga_graph/state/agent_state.py
  • src/cuga/backend/skills/loader.py
  • tests/unit/test_slash_langfuse_span.py
  • src/cuga/backend/slash_commands/message_synthesis.py
  • src/cuga/backend/slash_commands/arg_substitution.py
  • src/cuga/backend/skills/registry.py
  • tests/unit/test_stream_event_format.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
  • src/frontend_workspaces/frontend/src/carbon-chat/CarbonChat.tsx
  • tests/unit/test_slash_builtin_discovery.py
  • tests/unit/test_slash_parser.py
  • src/frontend_workspaces/frontend/tests/e2e/slash-chips.spec.ts
  • tests/unit/test_slash_clear_builtin.py
  • src/frontend_workspaces/frontend/src/carbon-chat/SlashChips.tsx
  • src/cuga/backend/slash_commands/command_resolver.py
  • tests/unit/test_slash_allowed_tools_enforcement.py
  • src/cuga/backend/slash_commands/registry.py
  • tests/unit/test_slash_message_synthesis.py
  • tests/unit/test_slash_arg_substitution.py

Comment thread src/cuga/sdk.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/cuga/sdk.py (2)

1805-1835: ⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Persist skill_allowed_tools across approval resume.

The whitelist is only injected in the normal invocation path at Lines 1960-1968. Once a slash skill pauses for HITL approval and the caller resumes with invoke(None, ...), this branch never restores it, so subsequent planner turns run without the skill's tool boundary.

Also applies to: 1960-1968

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/sdk.py` around lines 1805 - 1835, When resuming execution in the
invoke/resume branch (where message is None or action_response is provided) you
need to restore the skill whitelist into the run_config so tool boundaries
persist across HITL resumes; set
run_config["configurable"]["skill_allowed_tools"] the same way it's injected in
the normal invocation path (the code that injects skill_allowed_tools during
regular invoke) before calling self.graph.ainvoke(None, config=run_config),
ensuring the whitelist is present whether or not action_response is provided so
subsequent planner turns respect the skill's allowed tools.

2140-2185: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

stream() bypasses slash dispatch entirely.

Unlike invoke(), the streaming path wraps raw /... text in HumanMessage and never injects synthesized skill messages or skill_allowed_tools. Streaming clients will therefore miss built-ins/unknown-command UX and can bypass slash-specific gating.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/sdk.py` around lines 2140 - 2185, The stream() path currently wraps
raw "/..." strings into HumanMessage and skips the slash-dispatch logic used by
invoke(), so streaming bypasses synthesized skill messages and skill gating;
update stream() to replicate invoke()'s slash dispatch flow by detecting
messages that begin with "/" (or reusing the same helper used by invoke()),
synthesize and insert the corresponding skill/system messages into the messages
list (or replace the HumanMessage where appropriate), and ensure
run_config["configurable"]["skill_allowed_tools"] (and any other
dispatch-related configurable keys) are set the same way invoke() does before
building initial_state and calling graph.astream; reference the stream()
function, initial_state, run_config, and invoke() so you add/ reuse the dispatch
logic there.
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py (1)

2108-2140: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pass response.tool_calls to AIMessage to preserve tool-call context across checkpoint cycles.

Both branches (lines 2108–2140 and 2146–2207) create AIMessage with only the content parameter, discarding response.tool_calls. On the next turn, the message replay cannot restore the tool-call context to the model, since the stored AIMessage has no tool_calls attribute. This orphans any following ToolMessages that reference the missing tool-call IDs.

Include tool_calls in the AIMessage constructor:

Example fix
# Line 2109 (code branch):
updated_messages = modified_chat_messages + [AIMessage(content=content, tool_calls=_resp_tool_calls)]

# Line 2147 (NL branch):  
updated_messages = modified_chat_messages + [AIMessage(content=planning_response, tool_calls=_resp_tool_calls)]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` around lines
2108 - 2140, The AIMessage instances created in the sandbox-transition branches
(the AIMessage constructed from content and the one from planning_response) are
losing tool-call context because they only pass content; update both uses of
AIMessage (the one assigned to updated_messages and the NL/planning branch) to
include the tool_calls from the response (e.g., pass response.tool_calls or the
existing _resp_tool_calls variable) so that tool-call IDs are preserved across
checkpoint cycles; also ensure any subsequent AIMessage created for error
reporting similarly includes the same tool_calls if appropriate.
♻️ Duplicate comments (2)
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py (1)

1972-1983: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Still treat placeholder tool_call_id values as invalid during replay.

This now skips empty IDs, but replayed ToolMessages can still carry sentinels like "unknown". Sending those as role="tool" messages leaves them unmatched to any assistant tool call and can still trigger provider-side 400s on history replay.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` around lines
1972 - 1983, The ToolMessage branch currently appends the message then only
skips empty tool_call_id values, but sentinel placeholders like "unknown" still
pass through; update the ToolMessage handling in cuga_lite_graph (the block that
touches modified_chat_messages and messages_for_model) to treat common
placeholder IDs (e.g., "unknown", "null", "none", case-insensitive) as invalid:
first read tool_call_id = getattr(msg, "tool_call_id", "") or "" and normalize
(strip().lower()), and if it is empty or in a set of sentinel values then do not
append to modified_chat_messages or messages_for_model (continue); otherwise
append as before with the real tool_call_id. Ensure checks reference
ToolMessage, modified_chat_messages, and messages_for_model so the change is
easy to locate.
src/cuga/sdk.py (1)

1660-1671: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Slash import failures still fail open into the planner.

If importing slash_commands blows up, this helper returns None, and invoke() treats that as “no slash result”. A broken slash subsystem would therefore execute /... as normal planner input, bypassing built-in handling and skill allowed_tools gating again.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/sdk.py` around lines 1660 - 1671, The except block in
_dispatch_slash currently swallows import errors and returns None so invoke
treats a broken slash subsystem as "no slash result" and falls back to the
planner; change the behavior so import failures do not silently allow planner
fallback by re-raising the exception instead of returning None. Update the
except block in async def _dispatch_slash(...) (the import of SkillRegistry,
discover_skills, build_command_resolver, build_slash_registry,
parse_and_dispatch) to log the exception (logger.exception is fine) and then
raise the caught exception (or a wrapped exception) so invoke sees the failure
and does not treat it as a normal "no result" case.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cuga/backend/slash_commands/command_resolver.py`:
- Around line 34-43: The _cosine(a, b) scorer must detect mismatched embedding
dimensions and fail closed: add an explicit length check (if len(a) != len(b)
return 0.0) before computing the dot product, and use zip(a, b, strict=True)
when summing products to ensure any unexpected length drift is caught
deterministically; keep the existing zero-norm guards (na/nb == 0.0) and return
0.0 on mismatch rather than silently truncating.

---

Outside diff comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 2108-2140: The AIMessage instances created in the
sandbox-transition branches (the AIMessage constructed from content and the one
from planning_response) are losing tool-call context because they only pass
content; update both uses of AIMessage (the one assigned to updated_messages and
the NL/planning branch) to include the tool_calls from the response (e.g., pass
response.tool_calls or the existing _resp_tool_calls variable) so that tool-call
IDs are preserved across checkpoint cycles; also ensure any subsequent AIMessage
created for error reporting similarly includes the same tool_calls if
appropriate.

In `@src/cuga/sdk.py`:
- Around line 1805-1835: When resuming execution in the invoke/resume branch
(where message is None or action_response is provided) you need to restore the
skill whitelist into the run_config so tool boundaries persist across HITL
resumes; set run_config["configurable"]["skill_allowed_tools"] the same way it's
injected in the normal invocation path (the code that injects
skill_allowed_tools during regular invoke) before calling
self.graph.ainvoke(None, config=run_config), ensuring the whitelist is present
whether or not action_response is provided so subsequent planner turns respect
the skill's allowed tools.
- Around line 2140-2185: The stream() path currently wraps raw "/..." strings
into HumanMessage and skips the slash-dispatch logic used by invoke(), so
streaming bypasses synthesized skill messages and skill gating; update stream()
to replicate invoke()'s slash dispatch flow by detecting messages that begin
with "/" (or reusing the same helper used by invoke()), synthesize and insert
the corresponding skill/system messages into the messages list (or replace the
HumanMessage where appropriate), and ensure
run_config["configurable"]["skill_allowed_tools"] (and any other
dispatch-related configurable keys) are set the same way invoke() does before
building initial_state and calling graph.astream; reference the stream()
function, initial_state, run_config, and invoke() so you add/ reuse the dispatch
logic there.

---

Duplicate comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 1972-1983: The ToolMessage branch currently appends the message
then only skips empty tool_call_id values, but sentinel placeholders like
"unknown" still pass through; update the ToolMessage handling in cuga_lite_graph
(the block that touches modified_chat_messages and messages_for_model) to treat
common placeholder IDs (e.g., "unknown", "null", "none", case-insensitive) as
invalid: first read tool_call_id = getattr(msg, "tool_call_id", "") or "" and
normalize (strip().lower()), and if it is empty or in a set of sentinel values
then do not append to modified_chat_messages or messages_for_model (continue);
otherwise append as before with the real tool_call_id. Ensure checks reference
ToolMessage, modified_chat_messages, and messages_for_model so the change is
easy to locate.

In `@src/cuga/sdk.py`:
- Around line 1660-1671: The except block in _dispatch_slash currently swallows
import errors and returns None so invoke treats a broken slash subsystem as "no
slash result" and falls back to the planner; change the behavior so import
failures do not silently allow planner fallback by re-raising the exception
instead of returning None. Update the except block in async def
_dispatch_slash(...) (the import of SkillRegistry, discover_skills,
build_command_resolver, build_slash_registry, parse_and_dispatch) to log the
exception (logger.exception is fine) and then raise the caught exception (or a
wrapped exception) so invoke sees the failure and does not treat it as a normal
"no result" case.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 568da263-b354-448d-a141-7917df2a2669

📥 Commits

Reviewing files that changed from the base of the PR and between 3b9dad7 and 720f43c.

📒 Files selected for processing (4)
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
  • src/cuga/backend/slash_commands/command_resolver.py
  • src/cuga/sdk.py
  • src/frontend_workspaces/frontend/tests/e2e/static-server.cjs
✅ Files skipped from review due to trivial changes (1)
  • src/frontend_workspaces/frontend/tests/e2e/static-server.cjs

Comment on lines +34 to +43
def _cosine(a: Sequence[float], b: Sequence[float]) -> float:
"""Cosine similarity between two vectors; 0.0 for any zero-norm input."""
if not a or not b:
return 0.0
dot = sum(x * y for x, y in zip(a, b))
na = sum(x * x for x in a) ** 0.5
nb = sum(x * x for x in b) ** 0.5
if na == 0.0 or nb == 0.0:
return 0.0
return dot / (na * nb)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "*.py" | grep -E "(setup|pyproject|requirements)" | head -20

Repository: cuga-project/cuga-agent

Length of output: 153


🏁 Script executed:

git ls-files | grep -E "(setup\.py|pyproject\.toml|requirements)" | head -20

Repository: cuga-project/cuga-agent

Length of output: 266


🏁 Script executed:

cat -n src/cuga/backend/slash_commands/command_resolver.py 2>/dev/null | head -60

Repository: cuga-project/cuga-agent

Length of output: 2515


🏁 Script executed:

cat pyproject.toml | grep -A 20 "python"

Repository: cuga-project/cuga-agent

Length of output: 4729


🏁 Script executed:

cat -n src/cuga/backend/slash_commands/command_resolver.py | tail -n +45 | head -80

Repository: cuga-project/cuga-agent

Length of output: 3752


Guard _cosine against mismatched embedding dimensions.

The embedding function is injected and external, so dimensions could drift (version updates, stale embeddings, implementation changes). Currently, plain zip() silently truncates, allowing partial vectors to rank commands instead of failing closed. Add a length check and zip(..., strict=True) to keep the scorer deterministic.

Suggested patch
 def _cosine(a: Sequence[float], b: Sequence[float]) -> float:
     """Cosine similarity between two vectors; 0.0 for any zero-norm input."""
     if not a or not b:
         return 0.0
-    dot = sum(x * y for x, y in zip(a, b))
+    if len(a) != len(b):
+        return 0.0
+    dot = sum(x * y for x, y in zip(a, b, strict=True))
     na = sum(x * x for x in a) ** 0.5
     nb = sum(x * x for x in b) ** 0.5
     if na == 0.0 or nb == 0.0:
         return 0.0
🧰 Tools
🪛 Ruff (0.15.12)

[warning] 38-38: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/slash_commands/command_resolver.py` around lines 34 - 43,
The _cosine(a, b) scorer must detect mismatched embedding dimensions and fail
closed: add an explicit length check (if len(a) != len(b) return 0.0) before
computing the dot product, and use zip(a, b, strict=True) when summing products
to ensure any unexpected length drift is caught deterministically; keep the
existing zero-norm guards (na/nb == 0.0) and return 0.0 on mismatch rather than
silently truncating.

@illeatmyhat illeatmyhat requested a review from sami-marreed May 18, 2026 00:58
First-class slash-command support: three built-ins (/help, /clear, /skills),
invocation of repo-defined skills (e.g. /echo), an autocomplete dropdown,
embedding-based suggestions for typos, and an allowed-tools whitelist
enforced at the HITL approval gate. Skill invocation surfaces as a "Skill
invoked: /<name>" step inside the assistant message's reasoning panel
(toggle relabelled "Show details") instead of a separate chip bubble.

Backend:
- Slash dispatcher, embedding-backed command resolver, allowed-tools gate
  wired into the HITL approval flow, semantic suggestions, GET /api/commands.
- ChatHistoryMessage discriminated-union rehydration so ToolMessage /
  AIMessage.tool_calls round-trip correctly across SSE.
- Per-skill allowed_tools enforced at approval; resume-path inherits the
  same trust boundary.

Frontend:
- Autocomplete dropdown + clickable suggestion chips integrated with
  Carbon's composer textarea (ARIA combobox wiring, thread-rotation
  support, dark mode).
- /clear UX: reset thread state without page reload.
- Reasoning-panel-as-history: slash invocations replay as steps, no
  separate SlashSkillChip element.

Tests:
- 116 unit tests covering resolver, args substitution, allowed-tools
  enforcement, message synthesis.
- Playwright e2e: dropdown, help, skills, echo, suggestions, /clear.

Closes #N/A. Supersedes the fork PR (#230). Re-hosted on the org repo
so CI receives org secrets.
CodeQL js/incomplete-sanitization (CWE-116) flagged the escapeBackticks
helper on both the live (customSendMessage.ts) and history-replay
(customLoadHistory.ts) skill-chip paths: backslashes in resolved_name /
raw_input / raw_args were not escaped before backticks, so `\\\`` could
still close the inline code span. Escape `\\` first, then `` ` ``, on both
paths so they stay in lockstep.
- cuga_lite_graph: drop ToolMessage entries with empty tool_call_id from
  the model payload so strict providers don't 400 (history copy keeps
  them for replay).
- sdk: when /-prefixed slash dispatch raises, surface a structured
  error instead of falling back to the planner, which would bypass
  allowed_tools gating.
- command_resolver: drop the implicit numpy import; pure-Python cosine
  is faster than numpy for the realistic N<=8 vector size anyway.
- static-server (e2e): replace startsWith(ROOT) traversal guard with
  path.resolve + path.relative boundary check.
@illeatmyhat illeatmyhat force-pushed the slash-commands-backend branch from 720f43c to e7723be Compare May 20, 2026 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants