fix: prevent interactive workers from getting stuck in idle after outcome#551
fix: prevent interactive workers from getting stuck in idle after outcome#551marcmantei wants to merge 31 commits intospacedriveapp:mainfrom
Conversation
Add a SQLite-backed registry that auto-discovers GitHub repositories via `gh repo list`, syncs periodically in the background, and optionally auto-clones new repos. Per-repo config overrides (model, enabled) are preserved across syncs. New modules: - registry/store.rs — RegistryStore CRUD (upsert, list, overrides, link) - registry/sync.rs — discovery via gh CLI, reconciliation, auto-clone - api/registry.rs — 5 REST endpoints for repos, overrides, sync, status Config: [defaults.registry] section with github_owners, clone_base_dir, sync_interval_secs, auto_clone, exclude_patterns. Wired into RuntimeConfig (hot-reloadable), AgentDeps, ApiState, and main agent init loop with background sync task. 9 new tests, all 633 tests pass. Refs: #1 Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Send a summary message via MessagingManager when registry sync discovers new repos or archives removed ones. Configurable via `notification_target` in [defaults.registry] (e.g. "telegram:1285309093"). Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
The sync loop was spawned before messaging_manager was set on AgentDeps, so notifications were never sent. Now spawns after init, reading the sync status from ApiState. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
1. Fix cortex profile 404: default routing model was "anthropic/claude-sonnet-4" (doesn't exist), now uses "anthropic/claude-sonnet-4-20250514". 2. Fix worker outcome reporting: when a webhook-triggered worker completes and Lira relays the result via send_message_to_another_channel (e.g. to Telegram), the retrigger system didn't recognize it as a successful relay because only the reply tool set the replied_flag. Now SendMessageTool also sets the flag on successful delivery. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
The agent now uses the registry as its dynamic project list: 1. System prompt injection: enabled registry repos are appended to the project context so the agent knows about all discovered repos (name, path, language, model override, default branch). 2. Webhook event filtering: events from repos not in the registry or with enabled=false are dropped before reaching the agent. This replaces the need for a hardcoded project table in ROLE.md. Combined with the GitHub App (all repos) and registry sync (auto-discover), this completes the zero-config flow: new repo → auto-discovered → webhooks delivered → agent reacts. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
…-app feat: Dynamic Project Registry + GitHub App Migration
Browser tools fail to launch Chromium because libxkbcommon.so.0 is not installed in the container. Closes spacedriveapp#425
Every developer worker now receives the full CUPID framework (cupid.dev) as non-negotiable coding principles: Composable, Unix philosophy, Predictable, Idiomatic, Domain-based — with detailed actionable guidance for each. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Automates the source-based upgrade flow: git fetch → check for new commits → git pull → cargo build --release → stop daemon → atomic binary swap → systemctl restart. Supports --check-only for dry runs and --no-restart for manual service management. Also updates Deployment::Native detection in update.rs to report can_apply=true when a source directory is found. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
When all worker slots were occupied (especially by idle resumed workers), new spawn requests were rejected with WorkerLimitReached. The LLM then hallucinated a queue mechanism that didn't exist, telling users tasks would "auto-spawn when a slot opens" — but they were silently dropped. This commit fixes three issues: 1. Worker queue: spawn requests that hit the limit are now queued in a per-channel VecDeque and auto-spawned when a WorkerComplete event frees a slot. 2. Idle worker eviction: when the limit is reached, the oldest idle worker (e.g. resumed after restart but doing nothing) is cancelled to make room for new active work. 3. Structured response: the LLM now receives `queued: true` with a clear message instead of a tool error, eliminating hallucinated queue behavior. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Adds a background loop that periodically polls open PRs across all registered repos using `gh pr list`. When a PR with mergeable status CONFLICTING is detected, a message is injected into the webhook channel so the agent can spawn a worker to resolve the conflicts. Configurable via `pr_conflict_check_interval_secs` in [registry] (default: 600s = 10 minutes, set to 0 to disable). Tracks notified PRs to avoid duplicate messages; clears tracking when conflicts are resolved. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Startup catch-up: after boot, scans all enabled repos for open issues created in the last 24h that lack the `prd-generated` label. Injects them as synthetic webhook messages so the agent processes missed issues from downtime or frozen states. Health watchdog: background task that monitors message processing activity. If no messages are processed for 10 minutes, exits with code 1 so systemd auto-restarts the service. Prevents silent freezes from going undetected. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
feat: startup catch-up and health watchdog
- New "Ember" dark theme with dark red/orange accent colors - 3-tier Pricing page (Free OSS / Teams / Enterprise) inspired by Cline - Kanban board: task dependency support via metadata (depends_on) - Kanban board: blocked task indicators and dependency chips - Kanban board: dependency field in Create Task dialog - Sidebar: Pricing page navigation link Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
- New `spacebot local` subcommand: starts in foreground, auto-opens browser - Optional `--port` flag for custom port - Welcome page (/welcome) with onboarding checklist for first-time users - Guides through: provider setup, agent creation, platform connection Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Python-based MCP server that exposes Spacebot's HTTP API as MCP tools. Zero dependencies — works with Python 3.8+. Tools: list_agents, list/create/update/execute tasks, list_workers, send_message, list_channels, system status. Enables Claude Code, Codex, VS Code to orchestrate Spacebot agents via standard MCP protocol over stdio. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
- Task cards show assigned_agent badge (e.g. "claude-code", "codex") - Task cards show worktree indicator when metadata.worktree is set - Worker ID shown with first 8 chars instead of generic "Worker" label - Create dialog: new "Assign Agent" field alongside dependencies - Detail dialog: shows agent assignment and worktree metadata Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
- New TaskDependencyGraph component using graphology + Sigma.js - Board/Graph toggle in Kanban toolbar - Force-directed layout for task dependency visualization - Color-coded by status, sized by priority - Blocked tasks shown in red - Click node to open task detail - Legend with status colors Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Backend:
- New GET /agents/tasks/{number}/diff endpoint
- Reads worktree/branch from task metadata, runs git diff
- Returns unified diff with branch/worktree info
Frontend:
- New DiffViewer component with syntax-highlighted unified diffs
- Green/red line coloring for additions/removals
- Sticky file headers, hunk separators
- Auto-fetches diff when task has worktree/branch metadata
- Shown in task detail dialog before metadata section
Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
Backend:
- POST /agents/tasks/{number}/worktree — creates branch task/{N} + worktree
- DELETE /agents/tasks/{number}/worktree — removes worktree, keeps branch ref
- Worktree stored in .spacebot-worktrees/ adjacent to repo
- Task metadata updated with worktree path and branch name
Frontend:
- "Create Worktree" button in task detail dialog (shown for non-done tasks)
- Worktree badge on task cards
MCP Bridge:
- New spacebot_create_worktree tool
- New spacebot_task_diff tool
Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
- Run cargo fmt on channel.rs, channel_dispatch.rs, tasks.rs, main.rs, pr_conflicts.rs, spawn_worker.rs, upgrade.rs - Fix missing closing > on JSX div element in AgentTasks.tsx (line 252) - Fix TS2322: cast metadata?.worktree and metadata?.assigned_agent with !! to satisfy ReactNode constraint (Record<string, unknown> values)
task.metadata is serde_json::Value not Option<Value>. Remove incorrect .as_ref()/.unwrap_or_else() calls and call .get() directly on the Value in task_diff, create_worktree, and delete_worktree handlers.
Welcome Page:
- Detects Claude Max OAuth (anthropic provider check)
- Shows provider type ("Anthropic (Claude Max / OAuth)")
- Agent status with direct navigation
- Kanban Board quick-link when agent exists
Kanban Board:
- Project filter dropdown (all repos from registry)
- GitHub Issues toggle with count badge
- GitHub Issues column showing issues from all registry repos
- Issues show repo name, labels, assignees, timestamp
- Click issue to open on GitHub
Backend:
- New GET /api/registry/issues endpoint
- Fetches issues from all registry repos via gh CLI
- Supports repo filter and state filter (open/closed)
- Sorted by updated_at descending
Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
- Replace non-null assertions with safe variable access - Use href links instead of TanStack navigate for dynamic agent routes - Prevents blank page when agents query is still loading Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Backend: - Fix Anthropic OAuth detection in providers API (anthropic_oauth.json) - /api/providers now returns anthropic: true for Claude Max users Frontend: - New useSpacebotConfig hook (merges agentConfig + messaging + providers) - Kanban is now the default agent landing page (Board tab) - Overview moved to /agents/$agentId/overview tab - Config status bar: worker model, concurrency, worktrees, platforms, auth - GitHub Issue import button (creates Kanban task from issue) - Already-imported issues show "Imported" badge - Project filter now filters both tasks AND issues Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
…come Workers that call set_status(kind="outcome") now skip the idle follow-up loop entirely. Previously, they would unconditionally enter the loop and block forever on input_rx.recv(), creating a circular dependency: - WorkerComplete only fires after worker.run() returns - worker.run() only returns when the follow-up loop exits - The follow-up loop only exits when the sender is dropped - The sender is only dropped on WorkerComplete This caused a permanent crash-loop: the health watchdog would kill the process every ~10 minutes, and on restart the idle worker would be resumed but produce no activity. Also hardens log_worker_idle() to not overwrite terminal states (done/failed/completed/cancelled), preventing a race condition between fire-and-forget tokio::spawn database updates. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
WalkthroughThis pull request introduces a comprehensive GitHub registry integration system with automated repository discovery, syncing, and dependency management. It adds UI improvements including new routes (Welcome, Pricing), theme customization, task dependency visualization, and worker capacity management with queueing. Additional features include CLI upgrade support, health watchdog monitoring, and an MCP bridge for external integrations. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ast-grep (0.42.1)src/api/server.rsThanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/api/agents.rs (1)
766-779:⚠️ Potential issue | 🔴 CriticalAdd registry store to ApiState when creating agents.
The code creates
registry_storeat lines 766–779 but never inserts it intostate.registry_stores, unlikeproject_storesat lines 962–966. Since all/api/registry/*handlers (insrc/api/registry.rs) look up stores viastate.registry_stores.load(), newly created agents will returnStatusCode::NOT_FOUNDfor every registry request until the server restarts.After creating the registry store and its status tracker, insert them into ApiState following the same pattern as project_stores:
let mut registry_stores_map = (**state.registry_stores.load()).clone(); registry_stores_map.insert(agent_id.clone(), registry_store); state.registry_stores.store(std::sync::Arc::new(registry_stores_map)); let mut registry_status_map = (**state.registry_sync_status.load()).clone(); registry_status_map.insert( agent_id.clone(), Arc::new(ArcSwap::from_pointee(SyncStatus::Idle)), ); state.registry_sync_status.store(std::sync::Arc::new(registry_status_map));Also remove registry stores from
ApiStatein thedelete_agent()cleanup (same section where project stores are removed around line 1273).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/api/agents.rs` around lines 766 - 779, When creating agents, persist the newly constructed registry_store and its status into ApiState so registry handlers can find them: after creating registry_store (symbol: registry_store) and its SyncStatus, clone and insert into state.registry_stores (use state.registry_stores.load()/store pattern) and into state.registry_sync_status (insert an Arc::new(ArcSwap::from_pointee(SyncStatus::Idle)) keyed by agent_id), mirroring how project_stores is handled; likewise, in delete_agent() remove the agent's entries from state.registry_stores and state.registry_sync_status during cleanup to avoid stale entries.src/agent/channel.rs (1)
2843-2892:⚠️ Potential issue | 🟠 MajorCancelled workers still bypass the new queue drain.
This drain only runs after the existing
worker_handles.remove(...)guard passes.cancel_worker_with_reason()removes that handle before it emits its syntheticWorkerComplete, so cancelled workers free capacity without ever reaching this block and queued workers stay stuck until some unrelated worker finishes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/agent/channel.rs` around lines 2843 - 2892, The WorkerComplete handler currently returns early if removing worker_id from self.state.worker_handles returns None, which causes cancelled workers (which remove the handle earlier in cancel_worker_with_reason()) to skip running crate::agent::channel_dispatch::drain_worker_queue and leaves queued workers stuck; modify the ProcessEvent::WorkerComplete arm so that even when worker_handles.write().await.remove(worker_id) is None you still call drain_worker_queue(&self.state).await before returning (or refactor to run the drain unconditionally after the guard), keeping the existing logging/queued-spawn behavior; reference symbols: ProcessEvent::WorkerComplete, cancel_worker_with_reason, self.state.worker_handles, and crate::agent::channel_dispatch::drain_worker_queue.src/main.rs (1)
2396-2403:⚠️ Potential issue | 🟠 MajorTrack these registry task handles so agent removal can stop them.
registry_sync_loopandpr_conflict_check_loopare detached and forgotten, while Line 2398 only removes the agent from the map and disconnects MCP. A dynamically removed agent can keep syncing the registry and checking PR conflicts in the background after it is supposedly gone.Also applies to: 3436-3468
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main.rs` around lines 2396 - 2403, When removing an agent you need to stop its background tasks (registry_sync_loop and pr_conflict_check_loop) instead of just removing the agent and disconnecting MCP; change the spawn sites for registry_sync_loop and pr_conflict_check_loop to return and store their tokio::task::JoinHandle in the agent struct (e.g., add fields like registry_task_handle and pr_conflict_task_handle to the Agent or AgentDeps), and in the agent removal branch where agent_remove_rx.recv() leads to agents.remove(&key) call ensure you abort or join those handles (call JoinHandle::abort() or await the join) before finishing removal, then disconnect MCP and log; update the creation paths that currently detach/forget these loops to assign the handles into the agent so removal can stop them.
🟠 Major comments (21)
src/update.rs-107-115 (1)
107-115:⚠️ Potential issue | 🟠 MajorAdd routing in the apply endpoint for native deployments or defer
can_applyfor native until the upgrade path is implemented.Lines 107–115 and 240–254 enable
can_apply = trueforDeployment::Nativewhen the source directory is resolvable. However, the/update/applyendpoint (line 461 insrc/api/settings.rs) unconditionally callsapply_docker_update(), which immediately bails with "not running in Docker" for native deployments (line 334). This creates a broken UX: the UI enables the Apply button for native installs, but the endpoint fails.The
upgrademodule (public insrc/lib.rs) has all the functions needed for native upgrades (resolve_source_dir,check_for_updates,pull,build_release,install_binary,restart_service), but the API does not route native deployments to them.Either route native deployments to the upgrade module in the apply endpoint, or set
can_apply = falsefor native deployments until the native upgrade path is wired into the API.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/update.rs` around lines 107 - 115, The API currently enables status.can_apply for Deployment::Native based on resolve_source_dir but the /update/apply handler unconditionally calls apply_docker_update(), causing failures for native installs; either (A) modify the apply endpoint in src/api/settings.rs to route native deployments to the upgrade module flow (use resolve_source_dir, check_for_updates, pull, build_release, install_binary, restart_service in sequence) when deployment == Deployment::Native, or (B) revert the status logic in update.rs (where Deployment::Native sets status.can_apply and status.cannot_apply_reason) to set can_apply = false and an explanatory cannot_apply_reason until the native upgrade path is implemented; pick one approach and update the corresponding functions (Deployment::Native handling in update.rs or apply handler in settings.rs) so UI state and API behavior match.interface/src/routes/Welcome.tsx-111-117 (1)
111-117:⚠️ Potential issue | 🟠 MajorUse route-aware navigation for the agent links.
These absolute
hrefs bypass the app's configured base path, so they break when the UI is hosted under a subpath. TheViewlink also now resolves to/agents/$agentId, which was repointed to the task board inrouter.tsx, so the destination no longer matches the label here.Also applies to: 136-141
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@interface/src/routes/Welcome.tsx` around lines 111 - 117, Replace hardcoded anchor tags in Welcome.tsx that use href={`/agents/${firstAgentId}`} (and the similar block at 136-141) with the router-aware Link component (or useNavigate/to) so navigation respects the app base path; change the destination to the correct agent-details route (use your named route constant or path helper instead of `/agents/${id}` since that path now points to the task board) and pass firstAgentId (or relevant agentId) to that route, preserving the existing className and children.src/agent/channel.rs-2119-2169 (1)
2119-2169:⚠️ Potential issue | 🟠 MajorThis makes every channel prompt scale with the full registry.
On agents that discover lots of repos, every turn now injects one
ProjectContextper unlinked registry repo, which can materially increase token usage/latency and eventually hit context limits. Theif let Ok(...)also hides the failure case, so registry context can disappear without any signal. Please cap or summarize this list and at least log the error path. As per coding guidelines, "Don't silently discard errors. Nolet _ =on Results. Handle them, log them, or propagate them."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/agent/channel.rs` around lines 2119 - 2169, The current block calling self.deps.registry_store.list_repos(&self.deps.agent_id, true) silently ignores errors and pushes an unbounded ProjectContext per unlinked registry repo (via ProjectContext and ProjectRepoContext), which can blow up prompt size; change it to handle the Result error (log it with your agent logger, e.g., self.deps.logger/error or similar) instead of using if let Ok(...), and when successful, cap or summarize the registry_repos you expand into contexts (e.g., include only the first N entries, N configurable, and add a single summary ProjectContext stating "X additional registry repos omitted") so you don’t inject one context per repo and thereby limit token growth; ensure linked_project_ids filtering remains and preserve repo metadata for the included items.src/registry/pr_conflicts.rs-149-149 (1)
149-149:⚠️ Potential issue | 🟠 MajorPass the configured webhook bind address to
pr_conflict_check_loopinstead of hardcoding127.0.0.1.The webhook server's bind address is configurable via
webhook_config.bind(defaulting to127.0.0.1), butpr_conflict_check_looponly receives thewebhook_portparameter. If the webhook is configured to bind to a different address (e.g.,0.0.0.0or a custom IP), the hardcoded127.0.0.1in line 149 will fail to reach it. Extract and passwebhook_config.bindto the function alongsidewebhook_port.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/registry/pr_conflicts.rs` at line 149, The code hardcodes "127.0.0.1" when building webhook_url in pr_conflicts.rs; update the call site that invokes pr_conflict_check_loop so it also passes webhook_config.bind (the configured bind address) alongside webhook_port, and then modify pr_conflict_check_loop to accept a bind parameter and use it when formatting webhook_url (e.g., format!("http://{bind}:{webhook_port}/send")). Ensure you update all references to pr_conflict_check_loop and the parameter list to avoid compile errors.src/registry/sync.rs-179-213 (1)
179-213:⚠️ Potential issue | 🟠 MajorUse an owner-qualified directory for auto-clones.
clone_base_dir.join(&gh_repo.name)collides as soon as two configured owners both have a repo with the same name. The second repo will reuse or skip the first checkout, andlocal_pathends up pointing at the wrong repository.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/registry/sync.rs` around lines 179 - 213, The auto-clone path currently uses config.clone_base_dir.join(&gh_repo.name) which causes collisions for repos with the same name across different owners; change the directory to be owner-qualified (for example join the owner and repo name or join a sanitized full_name) so each repo gets a unique path. Update the clone_dir computation used by auto_clone_repo, the exists() check, and the subsequent store.set_local_path calls to use the new owner-qualified path (use gh_repo.owner.login or full_name sanitized) so full_name, auto_clone_repo, clone_dir, and set_local_path remain consistent and avoid cross-owner collisions.src/watchdog.rs-203-218 (1)
203-218:⚠️ Potential issue | 🟠 MajorPage through startup catch-up results instead of stopping at 20 issues.
A single outage window with more than 20 matching issues leaves the remainder unprocessed once they age out of the 24-hour search. Please paginate here, or keep fetching until GitHub returns fewer than a page.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/watchdog.rs` around lines 203 - 218, The current gh CLI call that sets output = tokio::process::Command::new("gh")... with args including "--limit","20" only fetches a single page; change this to paginate and keep fetching until a page returns fewer than page_size (20). Implement a loop around the Command invocation (use a page counter starting at 1 and add "--page", &page.to_string() to the args) that runs the command for repo and since, parses each page’s results and accumulates/processes them, and breaks when the returned list length < 20 (or empty); ensure you still pass the same search (--search &format!("-label:prd-generated created:>{since}") ) and collect results from each invocation rather than stopping after the first output.src/api/registry.rs-143-159 (1)
143-159:⚠️ Potential issue | 🟠 MajorRecord
Failedstatus before returning a 500 from manual sync.
Syncingis stored beforesync_registry(), but the?on error exits before the shared status is updated. After a failed manual sync,/api/registry/statusstays stuck onsyncinguntil some later successful run overwrites it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/api/registry.rs` around lines 143 - 159, The code sets SyncStatus::Syncing in state.registry_sync_status (via status_map.get(&query.agent_id)) but uses ? on the await of sync_registry(store, &query.agent_id, ®istry_config) so failures never update the shared status; change the error path to record a SyncStatus::Failed before returning the 500: capture the error from sync_registry (remove the direct map_err(?)) so you can, in the error branch, look up the same status in status_map (using query.agent_id) and call status.store(Arc::new(SyncStatus::Failed { at: chrono::Utc::now().to_rfc3339(), error: format!("{}", err) })), then return StatusCode::INTERNAL_SERVER_ERROR; keep the existing success path that stores SyncStatus::Completed with result.clone().src/api/tasks.rs-619-645 (1)
619-645:⚠️ Potential issue | 🟠 MajorDon't erase
worktreemetadata when removal failed.
let _ =discards the result ofgit worktree remove, so a failed removal still clears the DB pointer. That leaves an on-disk checkout orphaned from task metadata and makes later cleanup much harder. As per coding guidelines "Don't silently discard errors. Nolet _ =on Results. Handle them, log them, or propagate them."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/api/tasks.rs` around lines 619 - 645, The code currently discards the result of the git worktree removal (let _ =) so it always removes the "worktree" metadata even when removal failed; capture and inspect the result of TokioCommand::new("git")...output().await (or use status) and only call obj.remove("worktree") and persist via store.update when the command succeeded; if the command fails, log the error with context (include worktree_path, command stderr/stdout and the error) and return or map an appropriate StatusCode instead of silently proceeding. Ensure you modify the branch that runs TokioCommand, the variable worktree_path, the metadata handling that calls obj.remove("worktree"), and the subsequent store.update(...) (UpdateTaskInput) so metadata is only mutated on successful worktree removal.src/api/tasks.rs-416-439 (1)
416-439:⚠️ Potential issue | 🟠 MajorPropagate
git difffailures instead of returning an empty diff.These branches only handle spawn errors; a non-zero git exit currently becomes a 200 with empty stdout, which is indistinguishable from “no changes”. Check
output.status.success()and return an error when the diff command fails. As per coding guidelines "Don't silently discard errors. Nolet _ =on Results. Handle them, log them, or propagate them."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/api/tasks.rs` around lines 416 - 439, The git diff branches (where TokioCommand::new("git") is called for worktree_path and branch) only map spawn errors but ignore non-zero exit statuses; update both places to check output.status.success() after awaiting output, and if not successful log the stderr (using tracing::warn! with %error? or %String::from_utf8_lossy(&output.stderr)) and return an Err(StatusCode::INTERNAL_SERVER_ERROR) (or propagate a suitable error) instead of treating empty stdout as a valid diff; ensure the diff variable assignment for worktree_path and branch handles failed exit statuses consistently before converting stdout to String.src/agent/channel_dispatch.rs-646-649 (1)
646-649:⚠️ Potential issue | 🟠 MajorHandle
WorkerLimitReachedby falling through to queueing.The capacity check here isn't atomic with
do_spawn(). Another concurrent spawn can consume the slot beforespawn_*_from_stateruns, and this returns a limit error even though the contract ofspawn_or_queue_worker()is to queue on saturation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/agent/channel_dispatch.rs` around lines 646 - 649, The capacity check before spawning isn't atomic and currently returns early on any error; change the logic in spawn_or_queue_worker so that when check_worker_limit(state).await returns Ok you call do_spawn(state, &request).await.map(Some), but when it returns the specific WorkerLimitReached error you do not return an error and instead fall through to the queuing path (i.e., treat limit-reached as "queue this request"); for other unexpected errors propagate them as before. Update the conditional around check_worker_limit, referencing check_worker_limit, do_spawn, spawn_or_queue_worker (and spawn_*_from_state callers) to implement this selective handling.src/agent/channel_dispatch.rs-757-797 (1)
757-797:⚠️ Potential issue | 🟠 MajorDon't drop queued work on drain-side spawn races.
This removes the entry from
worker_queue/queued_workersbefore the spawn succeeds. If another request grabs the free slot first,spawn_*_from_statecan returnWorkerLimitReachedhere and the queued task is lost instead of retried.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/agent/channel_dispatch.rs` around lines 757 - 797, The code currently removes the task from worker_queue / status.queued_workers before attempting to spawn (see worker_queue.write().await.pop_front(), queued_workers, spawn_opencode_worker_from_state, spawn_worker_from_state), which loses the task if spawn fails with WorkerLimitReached; change the flow to keep the request in the queue and status until spawn succeeds — either delay removing the entry until after result is Ok(worker_id), or if spawn returns an error (specifically WorkerLimitReached) push the request back onto worker_queue and re-insert into status.queued_workers so the task will be retried; only remove from queued lists on successful spawn or on non-retriable errors.src/api/tasks.rs-427-436 (1)
427-436:⚠️ Potential issue | 🟠 MajorResolve the task's repository before running Git.
The branch-diff path and worktree creation both shell out in the API process's current directory, not the repo attached to the task. Imported registry issues can therefore diff or create a worktree in the wrong checkout, or just fail when the daemon wasn't started inside that repo.
Also applies to: 505-540
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/api/tasks.rs` around lines 427 - 436, The git commands are being run in the API process CWD instead of the task's repository; update the code that creates TokioCommand::new("git") (the branch-diff block using args(["diff", &format!("main...{br}")]) and the separate worktree creation code referenced at 505-540) to first resolve the task's repository path (e.g., via the task struct's repo path accessor or a helper like task.repo_dir()/resolve_repo_path(task)) and then call .current_dir(repo_path) on the TokioCommand before .args()/.output() so the git subcommands execute inside the correct repository; apply the same change to every git invocation in this file (including the worktree creation path) and preserve existing error mapping/logging.src/agent/channel_dispatch.rs-670-712 (1)
670-712:⚠️ Potential issue | 🟠 MajorMake duplicate detection and enqueue a single critical section.
Two concurrent callers at capacity can both pass the read-only duplicate checks and then
push_back()the same task, because the queue is only locked for the final write. Reuse the reservation mechanism or perform dedupe while holding the queue write lock.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/agent/channel_dispatch.rs` around lines 670 - 712, Replace the separate read-only duplicate checks + later push with a single critical section: compute normalized from request.task, then acquire locks in a consistent order (e.g. let mut status = state.status_block.write().await; let mut queue = state.worker_queue.write().await;) and inside that section call status.find_duplicate_worker_task(&request.task) and check queue.iter().any(...) using the same normalized logic; if duplicate return AgentError::DuplicateWorkerTask, otherwise push_back(request) to queue and push the same QueuedWorkerInfo into status. This ensures the duplicate check and the push_back/update of queued_workers happen atomically.src/api/registry.rs-261-290 (1)
261-290:⚠️ Potential issue | 🟠 MajorDon't silently skip a repo when issue JSON parsing fails.
If
gh issue listreturns malformed or unexpected JSON, this repo is just omitted because thefrom_sliceerror is discarded. Log the parse error or return a failure so the UI doesn't silently show partial results. As per coding guidelines "Don't silently discard errors. Nolet _ =on Results. Handle them, log them, or propagate them."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/api/registry.rs` around lines 261 - 290, The repo's JSON parsing currently ignores errors from serde_json::from_slice (used when building GitHubIssue items) which causes silent omission of a repository's issues; update the code around the from_slice::<Vec<serde_json::Value>>(&out.stdout) call to handle the Result: if parsing fails, log the parsing error (including repo_name and stderr/stdout) via the existing logger or return/propagate an Err from the enclosing function instead of silently continuing, so failures are visible; ensure logs reference repo_name and the serde_json error and preserve existing behavior on success where you iterate and push GitHubIssue entries.src/watchdog.rs-246-249 (1)
246-249:⚠️ Potential issue | 🟠 MajorUse a monotonic baseline for watchdog timestamps.
The
instant_to_epoch_secs()function relies onSystemTime, which is not monotonic and can jump due to NTP or manual clock adjustments. This causes the elapsed time calculation at line 267 to become incorrect, potentially triggering false restarts (if time jumps forward) or missing actual stalls (if time jumps backward). Store seconds since a fixedInstantinstead—such as the watchdog's initialization time.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/watchdog.rs` around lines 246 - 249, The code uses instant_to_epoch_secs/SystemTime for last_activity which is non-monotonic; change to a monotonic baseline by capturing a fixed Instant at spawn_watchdog start (e.g., let baseline = Instant::now()) and store seconds as (Instant::now().duration_since(baseline).as_secs()) in the AtomicU64 (last_activity). Update all places that call instant_to_epoch_secs or compare epoch seconds (including the elapsed calculation that compares last_activity to now) to instead compute monotonic elapsed seconds relative to the same baseline so the watchdog uses Instant-based, monotonic timestamps; keep the public signature spawn_watchdog and WatchdogHandle unchanged, only replace instant_to_epoch_secs usage and ensure the baseline is captured where the background thread and handle can access it.src/upgrade.rs-130-145 (1)
130-145:⚠️ Potential issue | 🟠 MajorAvoid deadlock: consume piped stdout or inherit it instead.
cargo buildis spawned with bothstdoutandstderrpiped, but onlystderris drained beforewait(). Cargo and its build scripts write to stdout, which will fill the pipe buffer and block the child process while the parent is stuck waiting—causing a deadlock. Either inherit stdout (.stdout(Stdio::inherit())) or read both streams concurrently. Additionally,.unwrap()on line 139 should handle the error rather than panicking.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/upgrade.rs` around lines 130 - 145, The child process created by Command::new("cargo") pipes both stdout and stderr but only drains stderr, risking a deadlock; change this by either making stdout inherited (replace .stdout(Stdio::piped()) with .stdout(Stdio::inherit())) or by reading stdout concurrently (spawn a thread/task to BUfReader the child.stdout and drain lines similar to how stderr is processed), and avoid panicking on child.stderr.take().unwrap() by handling the Option/Result (e.g., return an error if stderr is None or map it to a proper anyhow error) so Command::new / .spawn() -> child, child.stderr, reader.lines(), and on_line are all handled safely.src/main.rs-372-379 (1)
372-379:⚠️ Potential issue | 🟠 MajorPass an explicit local-mode flag into
cmd_start.Line 467 infers "local mode" from
port_override.is_some(), sospacebot localwithout--portnever opens the browser even though the subcommand advertises that behavior.start --foregroundandlocalare indistinguishable onceport_overrideisNone, so this needs a separate flag instead of overloading the port override.🩹 Suggested fix
- Command::Start { foreground } => cmd_start(cli.config, cli.debug, foreground, None), + Command::Start { foreground } => { + cmd_start(cli.config, cli.debug, foreground, None, false) + } ... - Command::Restart { foreground } => { + Command::Restart { foreground } => { cmd_stop_if_running(); - cmd_start(cli.config, cli.debug, foreground, None) + cmd_start(cli.config, cli.debug, foreground, None, false) } ... - Command::Local { port } => cmd_start(cli.config, cli.debug, true, port), + Command::Local { port } => cmd_start(cli.config, cli.debug, true, port, true), fn cmd_start( config_path: Option<std::path::PathBuf>, debug: bool, foreground: bool, port_override: Option<u16>, + open_browser: bool, ) -> anyhow::Result<()> { ... - let open_browser = foreground && port_override.is_some(); let browser_url = format!("http://{}:{}", config.api.bind, config.api.port);Also applies to: 401-405, 466-468
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main.rs` around lines 372 - 379, The subcommand handling is overloading port_override to signal "local mode", which fails when port is None; update calls to cmd_start so an explicit local-mode boolean is passed instead of inferring from port_override. Modify the Command::Local arm to call cmd_start(cli.config, cli.debug, true, port) with a new local_mode=true parameter (or equivalent) and update other callers (e.g., Command::Start, Command::Restart, Command::Status paths that invoke cmd_start) to pass local_mode=false where appropriate; update the cmd_start signature to accept this explicit local flag and use it instead of checking port_override.is_some() to decide whether to open the browser. Ensure references to port_override remain unaffected for port value only.src/main.rs-2156-2163 (1)
2156-2163:⚠️ Potential issue | 🟠 MajorPing the watchdog from the main loop, not only from message traffic.
Line 2163 arms the timeout immediately, but this file only resets it after inbound/injection routing. A healthy quiet daemon — including setup mode before any providers are configured — will therefore restart every 10 minutes even when nothing is stuck.
🩹 Suggested fix
let watchdog = spacebot::watchdog::spawn_watchdog( std::time::Duration::from_secs(600), // 10 minute timeout std::time::Duration::from_secs(60), // check every minute ); - // Ping immediately so the watchdog doesn't trigger during initial quiet period watchdog.ping(); + let mut watchdog_heartbeat = tokio::time::interval(std::time::Duration::from_secs(60)); + watchdog_heartbeat.set_missed_tick_behavior(tokio::time::MissedTickBehavior::Skip); // Main event loop: route inbound messages to agent channels loop { ... tokio::select! { + _ = watchdog_heartbeat.tick() => { + watchdog.ping(); + } Some(mut message) = inbound_next, if agents_initialized => {Also applies to: 2174-2174, 2387-2389, 2417-2428
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main.rs` around lines 2156 - 2163, The watchdog is currently only pinged during inbound/injection routing (calls to watchdog.ping()), causing healthy-but-quiet phases to trigger the 10-minute restart; update the main loop to periodically ping the watchdog (e.g., on each main loop tick or by scheduling a short interval timer) so that spawn_watchdog(...)’s watchdog.ping() is invoked independent of message traffic or provider setup; apply the same change to other occurrences (the other watchdog.ping() sites referenced) so setup mode and idle daemon states keep the watchdog reset.src/main.rs-593-618 (1)
593-618:⚠️ Potential issue | 🟠 MajorPass
--remoteand--branchtoupgrade::pull()to ensure consistency.
check_for_updates()fetches from and compares against the specified remote/branch (line 77 fetches fromremote, line 100 usesHEAD..{remote}/{branch}), butpull()only acceptssource_dirand runs a baregit pull(line 125). This means the command can report updates available from one target (e.g.,origin/main) but merge from a different tracking branch, creating a mismatch between what the check reports and what actually gets applied.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main.rs` around lines 593 - 618, The pull step uses upgrade::pull(&source_dir) which performs a bare git pull and can pull a different ref than the earlier upgrade::check_for_updates(&source_dir, &remote, &branch) used; update the API and call so pull accepts the same remote and branch (e.g., change upgrade::pull signature to upgrade::pull(source_dir: &Path, remote: &str, branch: &str) and invoke upgrade::pull(&source_dir, &remote, &branch) where currently called) so the actual merge target matches the check; ensure any error handling/return types are adjusted to match the modified signature.src/registry/store.rs-320-329 (1)
320-329:⚠️ Potential issue | 🟠 MajorPropagate decode failures for nullable fields.
unwrap_or(None)turns missing-column and type-mismatch errors intoNone, so bad migrations or query regressions look like absent data instead of failing loudly. Keep the fields nullable, but lettry_get::<Option<_>, _>errors surface.🩹 Suggested fix
- language: row.try_get("language").unwrap_or(None), - local_path: row.try_get("local_path").unwrap_or(None), + language: row + .try_get::<Option<String>, _>("language") + .context("missing language")?, + local_path: row + .try_get::<Option<String>, _>("local_path") + .context("missing local_path")?, ... - worker_model: row.try_get("worker_model").unwrap_or(None), + worker_model: row + .try_get::<Option<String>, _>("worker_model") + .context("missing worker_model")?, ... - project_id: row.try_get("project_id").unwrap_or(None), - last_synced_at: row.try_get("last_synced_at").unwrap_or(None), + project_id: row + .try_get::<Option<String>, _>("project_id") + .context("missing project_id")?, + last_synced_at: row + .try_get::<Option<String>, _>("last_synced_at") + .context("missing last_synced_at")?,As per coding guidelines "Don't silently discard errors. No
let _ =on Results. Handle them, log them, or propagate them."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/registry/store.rs` around lines 320 - 329, The current mapping silently turns decode errors into None by using unwrap_or(None) for fields like language, local_path, worker_model, project_id, last_synced_at; change these to call row.try_get::<Option<_>, _>("field_name")? (or the appropriate concrete Option<Type>) so that type-mismatch/missing-column errors propagate instead of being swallowed; keep the fields as Option<T> but return/propagate the Result error (using ? and existing .context(...) where needed) so failures surface during row decoding for functions that build the record from DB rows.src/registry/store.rs-245-247 (1)
245-247:⚠️ Potential issue | 🟠 MajorArchive all rows when the latest sync is empty.
Returning
0here means a successful discovery that finds no repos leaves every stale row non-archived forever. If callers want to suppress archive-on-error, that decision belongs in the sync layer, not inside this store method.🩹 Suggested fix
if seen_full_names.is_empty() { - return Ok(0); + let result = sqlx::query( + "UPDATE registry_repos \ + SET is_archived = 1, updated_at = CURRENT_TIMESTAMP \ + WHERE agent_id = ? AND is_archived = 0", + ) + .bind(agent_id) + .execute(&self.pool) + .await + .context("failed to archive repos for empty registry sync")?; + + return Ok(result.rows_affected()); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/registry/store.rs` around lines 245 - 247, The early return when seen_full_names.is_empty() prevents archiving stale rows; remove the `return Ok(0)` and let the rest of the method proceed to mark rows not present in `seen_full_names` as archived (i.e., perform the same update/DELETE logic that runs when there are discovered names). Ensure you still handle the empty set safely (avoid SQL IN () errors) by using the existing code path or a guard that executes the archive-all branch when `seen_full_names` is empty; do not move suppression logic here — callers that want to skip archiving should decide in the sync layer.
🟡 Minor comments (7)
interface/src/ui/style/colors.scss-204-204 (1)
204-204:⚠️ Potential issue | 🟡 MinorFix stylelint errors: add blank lines before inline section comments.
Line 204, Line 208, Line 212, Line 223, and Line 243 violate
scss/double-slash-comment-empty-line-before.Suggested lint-only patch
.ember-theme { --dark-hue: 8; --color-black: 0, 0%, 0%; --color-white: 0, 0%, 100%; + // dark red primary, warm orange secondary --color-accent: 4, 72%, 40%; --color-accent-faint: 14, 75%, 50%; --color-accent-deep: 0, 78%, 30%; + // warm-tinted text --color-ink: 15, 12%, 92%; --color-ink-dull: 10, 8%, 62%; --color-ink-faint: 8, 5%, 44%; + // sidebar — deep warm black --color-sidebar: 8, 15%, 3%; --color-sidebar-box: 8, 12%, 7%; @@ --color-sidebar-button: 8, 12%, 11%; --color-sidebar-selected: 4, 18%, 16%; --color-sidebar-shade: 8, 15%, 10%; + // main — warm dark surfaces --color-app: 8, 12%, 5%; @@ --color-app-frame: 8, 10%, 16%; --color-app-slider: 8, 10%, 11%; --color-app-explorer-scrollbar: 4, 14%, 20%; + // menu --color-menu: 8, 14%, 4%;Also applies to: 208-208, 212-212, 223-223, 243-243
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@interface/src/ui/style/colors.scss` at line 204, Several inline section comments (e.g., the comment text "// dark red primary, warm orange secondary" and the other inline comments at the same positions) violate scss/double-slash-comment-empty-line-before; fix by inserting a single blank line immediately above each of those inline // comments so there is an empty line separating them from the preceding code or block, ensuring the scss linter rule passes for those comment occurrences.interface/src/routes/Pricing.tsx-26-26 (1)
26-26:⚠️ Potential issue | 🟡 MinorPlaceholder CTA link for Teams tier.
The Teams tier CTA has
href: "#"which doesn't navigate anywhere meaningful. Users clicking "Get Started" will see no action, which is confusing.Consider either:
- Linking to a sign-up flow or waitlist
- Disabling the button with a "Coming Soon" label
- Removing the CTA until the Teams tier is available
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@interface/src/routes/Pricing.tsx` at line 26, The Teams tier CTA currently uses a placeholder href ("#") in the cta object inside Pricing.tsx (cta: { label: "Get Started", href: "#" }); update this to a meaningful action: either point href to the sign-up or waitlist route (e.g., "/signup" or "/waitlist"), or replace the CTA with a disabled/secondary button and change the label to "Coming Soon" (and remove the href), or remove the cta field altogether until the tier is available; apply the chosen change to the cta object for the Teams tier so the button no longer navigates to a dead anchor.interface/src/routes/Pricing.tsx-42-42 (1)
42-42:⚠️ Potential issue | 🟡 MinorPersonal email address in Enterprise CTA.
The Enterprise tier uses
mailto:mail@marcmantei.comwhich appears to be a personal email. For a customer-facing pricing page, consider using a business domain email (e.g.,sales@spacedrive.comorenterprise@spacedrive.com).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@interface/src/routes/Pricing.tsx` at line 42, The Enterprise CTA currently uses a personal mailto address in the cta object (cta: { label: "Contact Sales", href: "mailto:mail@marcmantei.com" }); update that href to a company-facing address (for example "mailto:sales@spacedrive.com" or "mailto:enterprise@spacedrive.com") so the Pricing component uses a business email for customer contact.src/config/types.rs-1026-1039 (1)
1026-1039:⚠️ Potential issue | 🟡 MinorDefault clone directory uses ephemeral
/tmppath.The default
clone_base_dirof/tmp/spacebot-reposmay be problematic:
/tmpis cleared on reboot on most Linux systems- Auto-cloned repos would be lost, triggering re-clones on restart
- Could cause unexpected disk usage in
/tmpwhich often has size limitsConsider defaulting to a path under the instance directory or agent data directory for persistence.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/config/types.rs` around lines 1026 - 1039, The Default impl for RegistryConfig sets clone_base_dir to an ephemeral "/tmp/spacebot-repos"; change it to a persistent data directory (e.g. under the instance or agent data directory) by replacing the hardcoded PathBuf::from("/tmp/spacebot-repos") in the RegistryConfig::default() implementation with a PathBuf resolved from the instance/agent data location (use an existing helper like an instance_dir/agent_data_dir function or std::env/dirs crate to derive a persistent path), ensuring the field name clone_base_dir in the Default impl is updated accordingly.mcp-bridge/spacebot_mcp_server.py-275-275 (1)
275-275:⚠️ Potential issue | 🟡 MinorRemove unnecessary f-string prefix.
Static analysis correctly flags this f-string has no placeholders.
🐛 Fix
- params = f"?limit=100" + params = "?limit=100"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mcp-bridge/spacebot_mcp_server.py` at line 275, The assignment to the variable params uses an unnecessary f-string with no placeholders; change the assignment of params from using f"?limit=100" to a plain string "?limit=100" (update the params variable initialization) to remove the redundant f-prefix and satisfy static analysis.src/registry/pr_conflicts.rs-128-132 (1)
128-132:⚠️ Potential issue | 🟡 MinorHTTP response is not checked for success status.
The
send().awaitsucceeds if the request was sent, but doesn't verify the response status code. A 4xx/5xx response would silently pass.🐛 Proposed fix to check response status
client .post(webhook_url) .json(&payload) .timeout(std::time::Duration::from_secs(10)) .send() .await - .map_err(|e| format!("failed to forward conflict message: {e}"))?; + .map_err(|e| format!("failed to forward conflict message: {e}"))? + .error_for_status() + .map_err(|e| format!("webhook returned error: {e}"))?; Ok(())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/registry/pr_conflicts.rs` around lines 128 - 132, The code currently only awaits .send().await (which ensures the request was sent) but doesn't verify HTTP success; capture the Response returned by .send().await (e.g., let resp = ... .send().await.map_err(...)?) and then call resp.error_for_status().map_err(|e| format!("failed to forward conflict message: {e}"))? to surface 4xx/5xx errors before returning Ok(()); update the existing map_err and return flow around .send().await and add the error_for_status check to ensure non-success statuses are treated as errors.interface/src/routes/AgentTasks.tsx-277-284 (1)
277-284:⚠️ Potential issue | 🟡 MinorApply the repo filter consistently outside the board.
With a
projectFilterselected, the blocked badge still counts fromtasksand graph mode still renderstasksinstead offilteredTasks, so the selected repo and the visible totals/nodes disagree.Also applies to: 391-397
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@interface/src/routes/AgentTasks.tsx` around lines 277 - 284, The blocked count and graph rendering are using the full tasks array instead of the repo-filtered list, causing mismatch with the selected projectFilter; change the code to compute and use a single source variable (e.g., const visibleTasks = projectFilter ? filteredTasks : tasks) and replace usages of tasks with visibleTasks for the blocked badge (where isBlocked is called) and for the graph rendering/graph mode code so both the badge and graph operate on filteredTasks when a projectFilter is active.
🧹 Nitpick comments (10)
src/agent/status.rs (1)
137-150: Keep queued-worker output bounded.These queued tasks are injected into the channel status block on every turn, but unlike
completed_itemsthey are neither capped nor truncated. Once the worker queue backs up, a few long tasks can dominate prompt space and crowd out actual conversation history. Consider limiting the rendered entries and summarizing each task before writing it into the status block.Also applies to: 409-421
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/agent/status.rs` around lines 137 - 150, The queued_workers list (Vec<QueuedWorkerInfo>) is unbounded when serialized into the status block and can dominate prompts; update the code that renders/serializes queued_workers (the same pattern used for completed_items) to (1) cap the number of entries included (e.g., keep the most recent N entries), (2) truncate or summarize each QueuedWorkerInfo.task to a safe max length (or replace with a short summary/ellipsis), and (3) if entries are omitted, add a small summary line like "X more queued workers omitted" so callers know items were dropped; implement this in the status rendering helper used for queued_workers and reuse the same bounding/truncation logic for the other affected block referenced (lines ~409-421).mcp-bridge/spacebot_mcp_server.py (2)
349-367: Add bounds check on Content-Length to prevent memory exhaustion.The
read_messagefunction readscontent_lengthbytes without validating the size. A malformed or malicious message with an extremely large Content-Length could cause memory exhaustion.🛡️ Proposed bounds check
+MAX_MESSAGE_SIZE = 10 * 1024 * 1024 # 10 MB def read_message() -> dict | None: """Read a JSON-RPC message from stdin.""" # Read headers content_length = 0 while True: line = sys.stdin.readline() if not line: return None line = line.strip() if not line: break if line.startswith("Content-Length:"): content_length = int(line.split(":")[1].strip()) if content_length == 0: return None + if content_length > MAX_MESSAGE_SIZE: + return None # Or raise an error content = sys.stdin.read(content_length) return json.loads(content)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mcp-bridge/spacebot_mcp_server.py` around lines 349 - 367, The read_message function currently trusts the Content-Length header and may allocate huge memory; add a bounds check around the parsed content_length in read_message to validate it's a positive integer within a safe maximum (e.g. MAX_CONTENT_LENGTH constant), handle non-integer or negative values gracefully, and return None or log an error when the value is out of bounds before calling sys.stdin.read(content_length); ensure you reference and validate the content_length variable and introduce a MAX_CONTENT_LENGTH near read_message to make the limit configurable.
43-51: Consider validating URL scheme to prevent file:// access.The
SPACEBOT_URLenvironment variable is used directly inurlopen()without scheme validation. While environment variables are generally trusted, a misconfiguration could expose local file access viafile://URLs.🛡️ Proposed validation
def api_request(method: str, path: str, body: dict | None = None) -> dict: """Make an HTTP request to the Spacebot API.""" url = f"{SPACEBOT_URL}/api{path}" + if not url.startswith(("http://", "https://")): + return {"error": f"Invalid URL scheme. SPACEBOT_URL must use http:// or https://"} data = json.dumps(body).encode() if body else None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mcp-bridge/spacebot_mcp_server.py` around lines 43 - 51, Validate the URL scheme before building/using urllib.request.Request: parse the variable/url with urllib.parse.urlparse and ensure the scheme is either "http" or "https" (reject "file", empty, or other schemes); if invalid, return a clear error (same structure as existing error returns) instead of calling urllib.request.urlopen. Reference symbols: SPACEBOT_URL, the local variable url, and the code paths around urllib.request.Request / urllib.request.urlopen to add the validation early.src/config/types.rs (1)
1349-1349: Registry config is not overridable per-agent.Unlike other config sections (e.g.,
projects,browser), the registry config is always cloned from defaults with no per-agent override. If this is intentional (registry is instance-level), this is fine. If per-agent registry settings might be needed later, consider addingOption<RegistryConfig>toAgentConfig.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/config/types.rs` at line 1349, The registry field is always cloned from defaults (registry: defaults.registry.clone()) so agents cannot override it; update the AgentConfig structure to accept an Option<RegistryConfig> (e.g., add registry: Option<RegistryConfig>) and change the place using defaults.registry.clone() to use agent.registry.clone().unwrap_or_else(|| defaults.registry.clone()) (or equivalent) so a per-agent RegistryConfig is respected when present but falls back to defaults.registry when None.interface/src/components/TaskDependencyGraph.tsx (2)
94-128: Force layout may be slow for large graphs.The inline force-directed layout runs 100 iterations with O(n²) repulsion calculations synchronously during render. For task boards with many items (50+), this could cause noticeable UI jank.
Consider:
- Reducing iterations for larger graphs
- Running layout in a Web Worker
- Using graphology-layout's built-in algorithms which are optimized
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@interface/src/components/TaskDependencyGraph.tsx` around lines 94 - 128, The synchronous O(n²) force-loop in the TaskDependencyGraph component (the nested repulsion loops over nodes, positions[], and the graph.forEachEdge attraction loop) should be made non-blocking for large graphs: detect large graph sizes (e.g., nodes.length > 50) and reduce the number of iterations (from 100 to a smaller dynamic value), or move the layout computation into a Web Worker that performs the repulsion/attraction loops and returns final positions, or replace the inline loops with a call to an optimized layout routine (e.g., graphology-layout / forceatlas2) and use that result to set positions; update the code paths referencing positions, nodes, and graph.forEachEdge to use the worker/optimized layout result asynchronously so rendering is not done on the main thread.
40-48: O(n²) lookup for blocked status calculation.The nested
.find()inside the loop creates O(n²) complexity when determining blocked status. For large task boards, consider pre-building a Map for O(1) lookups:♻️ Proposed optimization
const buildGraph = useCallback(() => { const graph = new Graph({ type: "directed" }); + const taskMap = new Map(tasks.map(t => [t.task_number, t])); // Add nodes for (const task of tasks) { const key = String(task.task_number); const deps = getDeps(task); const isBlocked = deps.length > 0 && deps.some((d) => { - const dt = tasks.find((t) => t.task_number === d); + const dt = taskMap.get(d); return dt && dt.status !== "done"; });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@interface/src/components/TaskDependencyGraph.tsx` around lines 40 - 48, The blocked-status calculation in the loop over tasks (the block using getDeps, isBlocked and task.task_number) is O(n²) due to the inner tasks.find; fix it by pre-building a Map from task_number to task (e.g., create taskMap from tasks before the loop) and replace the tasks.find lookup with taskMap.get(d) to achieve O(1) lookups; update any references in the isBlocked predicate to use taskMap.get and keep the rest of the logic unchanged.migrations/20260312000001_registry.sql (1)
26-27: No trigger forupdated_atauto-update.The
updated_atcolumn defaults toCURRENT_TIMESTAMPon insert but won't auto-update on modifications. SQLite doesn't have built-inON UPDATEfor timestamps. If auto-updating is needed, add a trigger:♻️ Proposed trigger addition
CREATE TRIGGER IF NOT EXISTS registry_repos_updated_at AFTER UPDATE ON registry_repos FOR EACH ROW BEGIN UPDATE registry_repos SET updated_at = CURRENT_TIMESTAMP WHERE id = NEW.id; END;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@migrations/20260312000001_registry.sql` around lines 26 - 27, The updated_at column in registry_repos is only set to CURRENT_TIMESTAMP on insert and won’t auto-update on modifications in SQLite; add a trigger (e.g., registry_repos_updated_at) on the registry_repos table that runs AFTER UPDATE FOR EACH ROW and sets updated_at = CURRENT_TIMESTAMP for the updated row (match by id using NEW.id) so updated_at is refreshed automatically on every update.src/registry/pr_conflicts.rs (1)
123-133: Reusereqwest::Clientinstead of creating per-request.Creating a new
reqwest::Clientfor each webhook call is inefficient. The client should be created once and reused to benefit from connection pooling.♻️ Proposed fix
pub async fn pr_conflict_check_loop( store: RegistryStore, agent_id: String, runtime_config: Arc<crate::config::RuntimeConfig>, webhook_port: u16, ) { // Initial delay to let the agent fully start up and registry sync to run first. tokio::time::sleep(Duration::from_secs(90)).await; let webhook_url = format!("http://127.0.0.1:{webhook_port}/send"); + let client = reqwest::Client::new(); // Track which PRs we've already sent a conflict message for... let mut notified: std::collections::HashSet<(String, u64)> = std::collections::HashSet::new();Then pass
&clienttoforward_to_webhook:-async fn forward_to_webhook(webhook_url: &str, repo: &str, message: &str) -> Result<(), String> { +async fn forward_to_webhook(client: &reqwest::Client, webhook_url: &str, repo: &str, message: &str) -> Result<(), String> { let conversation_id = format!("github:{repo}"); let payload = serde_json::json!({ "conversation_id": conversation_id, "sender_id": "pr-conflict-checker", "content": message, }); - let client = reqwest::Client::new(); client .post(webhook_url)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/registry/pr_conflicts.rs` around lines 123 - 133, The function forwarding webhook requests (forward_to_webhook) currently creates a new reqwest::Client per call; change it to accept and use a reused reqwest::Client instance instead (e.g., pass &reqwest::Client into forward_to_webhook or store one on the surrounding struct) so connection pooling is preserved; update call sites to construct a single Client once (e.g., at registry/handler initialization) and pass that client into forward_to_webhook and any other places that post to webhook_url.interface/src/hooks/useSpacebotConfig.ts (1)
48-55: Consider deduplicating enabled platforms.If multiple messaging instances share the same platform (e.g., two Discord bots),
enabledPlatformswill contain duplicates. This may cause UI issues if the array is used for rendering unique badges.♻️ Proposed fix using Set for deduplication
const enabledPlatforms: string[] = []; if (messagingData) { + const seen = new Set<string>(); for (const inst of messagingData.instances ?? []) { - if (inst.enabled && inst.platform) { + if (inst.enabled && inst.platform && !seen.has(inst.platform)) { + seen.add(inst.platform); enabledPlatforms.push(inst.platform); } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@interface/src/hooks/useSpacebotConfig.ts` around lines 48 - 55, The enabledPlatforms array in useSpacebotConfig may contain duplicates because you push inst.platform for each enabled instance; update the logic to deduplicate platforms (e.g., use a Set or filter/Array.from) so each platform appears once. Locate the block referencing messagingData.instances and enabledPlatforms in useSpacebotConfig, collect only inst.platform values for enabled instances, then convert to a deduplicated array before returning/using it to ensure unique badges in the UI. Ensure null/undefined platform values are skipped and the resulting type remains string[].src/registry/store.rs (1)
258-260: Renameqto a full word.This query builder is already dynamic, so the abbreviated local makes the binding flow harder to follow than it needs to be.
As per coding guidelines "Don't abbreviate variable names. Use
queuenotq,messagenotmsg,channelnotch. Common abbreviations likeconfigare fine."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/registry/store.rs` around lines 258 - 260, Rename the abbreviated local variable q to a full, descriptive name (e.g., query_builder) in the dynamic SQL builder flow: replace let mut q = sqlx::query(&query).bind(agent_id); and all subsequent q = q.bind(name); references with let mut query_builder = sqlx::query(&query).bind(agent_id); and query_builder = query_builder.bind(name); (ensure you update any further uses of q in this scope, including eventual .fetch_/execute calls, to the new name) so the binding flow using seen_full_names remains clear.
| // For interactive workers, enter a follow-up loop -- but only if the | ||
| // worker has not already signaled a terminal outcome. Workers that called | ||
| // set_status(kind="outcome") are done; entering the idle loop would block | ||
| // forever and prevent WorkerComplete from firing (circular dependency). | ||
| if !resuming && self.hook.outcome_signaled() { | ||
| tracing::info!( | ||
| worker_id = %self.id, | ||
| "worker signaled outcome, skipping idle follow-up loop" | ||
| ); | ||
| // Drop the input receiver so the channel cleans up and | ||
| // WorkerComplete fires normally. | ||
| self.input_rx.take(); | ||
| } |
There was a problem hiding this comment.
Don't re-enter idle after a follow-up already signaled outcome.
This guard only covers the first transition into the follow-up loop. A resumed worker, or any worker that emits set_status(kind="outcome") during a later follow-up, still falls through to Lines 675-678 and goes back to waiting for input, recreating the same circular wait. Please short-circuit the post-follow-up path on outcome_signaled() as well.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/agent/worker.rs` around lines 525 - 537, The post-follow-up path must
short-circuit if an outcome was signaled: after the follow-up handling (the
branch that may recreate the idle/wait-for-input state), check
self.hook.outcome_signaled() again and, if true, log and drop the input receiver
(self.input_rx.take()) and skip re-entering the idle/wait loop (return/continue
so WorkerComplete can fire). Update the code paths that run when resuming or
after a later follow-up to perform this guard before recreating the
waiting-for-input state to avoid the circular wait.
Problem
Interactive workers that call
set_status(kind="outcome")get permanently stuck inidlestatus, causing a crash-loop:set_status(kind="outcome")idleinput_rx.recv().awaitblocks forever — circular dependency:WorkerCompleteonly fires afterworker.run()returnsworker.run()only returns when the follow-up loop exitsWorkerCompleteThis results in permanent crash-loop with restarts every ~10 minutes. Restart counter reached 1194+ on our instance.
Fix
src/agent/worker.rs: Before entering the follow-up loop, checkoutcome_signaled(). If the worker already signaled a terminal outcome, dropinput_rximmediately soWorkerCompletecan fire normally.src/conversation/history.rs: Hardenlog_worker_idle()with a conditional UPDATE that skips terminal states (done/failed/completed/cancelled). This prevents a race condition where two fire-and-forgettokio::spawnDB updates execute out of order.Test plan
cargo checkpasses🤖 Generated with Claude Code
via Happy