Skip to content

feat(local-llm): prompt queue, shell safety, and main-thread performance#2156

Open
FrostyPhoenix2 wants to merge 1 commit intostackblitz-labs:mainfrom
FrostyPhoenix2:feat/local-llm-queue-optimizations
Open

feat(local-llm): prompt queue, shell safety, and main-thread performance#2156
FrostyPhoenix2 wants to merge 1 commit intostackblitz-labs:mainfrom
FrostyPhoenix2:feat/local-llm-queue-optimizations

Conversation

@FrostyPhoenix2
Copy link
Copy Markdown

Adds a complete workflow for running local small models (Qwen, Mistral, etc.) through multi-step project builds without browser lock-up or WebContainer overwhelm. Validated end-to-end with Qwen 2.5-coder:7b via Ollama against a 21-prompt React Native / Expo build queue.

Local LLM Settings Panel (new: LocalLLMPanel)

  • Per-session toggles: extended stream timeout (3 min vs 45 s), shell command blocking, slim system prompt, dedup file writes, strip old prose, token budget
  • Persisted to localStorage via nanostores map (localLLMSettingsStore)
  • Flyout panel anchored above the chat input bar with dynamic max-height

Prompt Queue (PromptQueuePanel)

  • Run N queued prompts end-to-end unattended; live progress bar
  • Adaptive delay between prompts: polls action-runner until all file writes / shell commands reach a terminal state before firing the next prompt — prevents WebContainer being overwhelmed by rapid-fire writes
  • Paste or load prompts from a newline-delimited text file

Shell Safety (action-runner)

  • Blocklist for hanging commands: expo start, npm install, yarn, pnpm, node, react-native run — throws before execution with a clear message
  • 90-second hard timeout on any shell action via Promise.race()
  • Toggle via blockHangingCommands setting

Stream + Context Optimisations (api.chat)

  • Extended stream timeout: 180 000 ms for local models (isLocalModel flag)
  • Context optimisation re-enabled for local models with graceful fallback: LLM pre-pass wrapped in withTimeout(); keyword matching fallback on timeout
  • Token limit clamped to model's reported max (fixes Qwen 8 000 token warning)

Main-Thread Performance (Chat.client)

  • 4 sequential setMessages() calls in onFinish collapsed into one, wrapped in startTransition() — eliminates "Page Unresponsive" dialogs during queues
  • storeMessageHistory deferred to requestIdleCallback (setTimeout fallback) so IndexedDB writes never block the render thread
  • Resource-exhaustion errors (ERR_INSUFFICIENT_RESOURCES, Failed to fetch) demoted from toast flood to console.warn

Slim Prompt (prompts/slim)

  • Minimal system prompt for 7B–13B models; hard_rules block first with DO NOT run install/server commands, DO NOT truncate files, DO NOT emit code outside boltArtifact — then concise format and behaviour rules
  • Registered in PromptLibrary as "Slim Prompt (local models)"

Artifact Render-Loop Fix

  • computed() store moved from inline JSX into useRef — fixes "Maximum update depth exceeded" on every workbenchStore update

ZIP Import (.bolt/prompt)

  • Detects .bolt/prompt inside imported ZIPs; surfaces content as the auto-fill textarea value so projects can ship a first-message template

Misc

  • useChatHistory snapshot save failure: toast → console.warn
  • Token limit clamped in stream-text.ts (model.maxTokenAllowed guard)

Summary

Adds a complete, validated workflow for running local small models (Qwen, Mistral, etc.) through multi-step project builds without browser lock-up or WebContainer overwhelm. Tested end-to-end with Qwen 2.5-coder:7b via Ollama against a 21-prompt React Native / Expo build queue — clean completion, responsive UI throughout.

What's included

Local LLM Settings Panel (LocalLLMPanel) — new flyout above the chat input with per-session toggles: extended stream timeout, shell command blocking, slim system prompt, dedup file writes, strip old prose, token budget. All persisted to localStorage.

Prompt Queue (PromptQueuePanel) — paste or load N prompts, run them unattended with a live progress bar. Uses adaptive delay between prompts: polls the action-runner until all file writes/shell commands reach a terminal state before firing the next one, preventing WebContainer overwhelm.

Shell safety — blocklist in action-runner.ts for commands that hang WebContainer (expo start, npm install, yarn, pnpm, node, react-native run). 90-second hard timeout on all shell actions. Toggle in the settings panel.

Stream + context optimisations — 180 s stream timeout for local models; context optimisation re-enabled for local models with timeout wrapper + keyword-matching fallback; token limit clamped to model's reported max (fixes Qwen 8 000 token warning).

Main-thread performance — 4 sequential setMessages() calls in onFinish collapsed into one startTransition()-wrapped call, eliminating "Page Unresponsive" dialogs during long queues. storeMessageHistory deferred to requestIdleCallback so IndexedDB writes never block rendering.

Slim prompt — minimal system prompt for 7B–13B models with hard_rules block first. Registered in PromptLibrary as "Slim Prompt (local models)".

Artifact render-loop fixcomputed() store moved from inline JSX to useRef, fixing "Maximum update depth exceeded" on every workbenchStore update.

ZIP import .bolt/prompt — detects a .bolt/prompt file inside imported ZIPs and surfaces its content as the auto-fill textarea value, letting projects ship their own first-message template.

Test plan

  • Enable Ollama in provider settings, load a local model
  • Open Local LLM panel — verify all toggles persist across page reload
  • Import a ZIP with a .bolt/prompt file — textarea should pre-fill
  • Queue 5+ prompts via PromptQueuePanel, run to completion — UI stays responsive
  • Verify shell commands like npm install are blocked with a clear error message
  • Send a chat with a non-local provider — verify nothing regressed

Adds a complete workflow for running local small models (Qwen, Mistral, etc.)
through multi-step project builds without browser lock-up or WebContainer
overwhelm. Validated end-to-end with Qwen 2.5-coder:7b via Ollama against
a 21-prompt React Native / Expo build queue.

## Local LLM Settings Panel (new: LocalLLMPanel)
- Per-session toggles: extended stream timeout (3 min vs 45 s), shell command
  blocking, slim system prompt, dedup file writes, strip old prose, token budget
- Persisted to localStorage via nanostores map (localLLMSettingsStore)
- Flyout panel anchored above the chat input bar with dynamic max-height

## Prompt Queue (PromptQueuePanel)
- Run N queued prompts end-to-end unattended; live progress bar
- Adaptive delay between prompts: polls action-runner until all file writes /
  shell commands reach a terminal state before firing the next prompt —
  prevents WebContainer being overwhelmed by rapid-fire writes
- Paste or load prompts from a newline-delimited text file

## Shell Safety (action-runner)
- Blocklist for hanging commands: expo start, npm install, yarn, pnpm,
  node, react-native run — throws before execution with a clear message
- 90-second hard timeout on any shell action via Promise.race()
- Toggle via blockHangingCommands setting

## Stream + Context Optimisations (api.chat)
- Extended stream timeout: 180 000 ms for local models (isLocalModel flag)
- Context optimisation re-enabled for local models with graceful fallback:
  LLM pre-pass wrapped in withTimeout(); keyword matching fallback on timeout
- Token limit clamped to model's reported max (fixes Qwen 8 000 token warning)

## Main-Thread Performance (Chat.client)
- 4 sequential setMessages() calls in onFinish collapsed into one, wrapped
  in startTransition() — eliminates "Page Unresponsive" dialogs during queues
- storeMessageHistory deferred to requestIdleCallback (setTimeout fallback)
  so IndexedDB writes never block the render thread
- Resource-exhaustion errors (ERR_INSUFFICIENT_RESOURCES, Failed to fetch)
  demoted from toast flood to console.warn

## Slim Prompt (prompts/slim)
- Minimal system prompt for 7B–13B models; hard_rules block first with
  DO NOT run install/server commands, DO NOT truncate files, DO NOT emit
  code outside boltArtifact — then concise format and behaviour rules
- Registered in PromptLibrary as "Slim Prompt (local models)"

## Artifact Render-Loop Fix
- computed() store moved from inline JSX into useRef — fixes
  "Maximum update depth exceeded" on every workbenchStore update

## ZIP Import (.bolt/prompt)
- Detects .bolt/prompt inside imported ZIPs; surfaces content as the
  auto-fill textarea value so projects can ship a first-message template

## Misc
- useChatHistory snapshot save failure: toast → console.warn
- Token limit clamped in stream-text.ts (model.maxTokenAllowed guard)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants