feat(local-llm): prompt queue, shell safety, and main-thread performance by FrostyPhoenix2 · Pull Request #2156 · stackblitz-labs/bolt.diy

FrostyPhoenix2 · 2026-04-16T23:24:02Z

Adds a complete workflow for running local small models (Qwen, Mistral, etc.) through multi-step project builds without browser lock-up or WebContainer overwhelm. Validated end-to-end with Qwen 2.5-coder:7b via Ollama against a 21-prompt React Native / Expo build queue.

Local LLM Settings Panel (new: LocalLLMPanel)

Per-session toggles: extended stream timeout (3 min vs 45 s), shell command blocking, slim system prompt, dedup file writes, strip old prose, token budget
Persisted to localStorage via nanostores map (localLLMSettingsStore)
Flyout panel anchored above the chat input bar with dynamic max-height

Prompt Queue (PromptQueuePanel)

Run N queued prompts end-to-end unattended; live progress bar
Adaptive delay between prompts: polls action-runner until all file writes / shell commands reach a terminal state before firing the next prompt — prevents WebContainer being overwhelmed by rapid-fire writes
Paste or load prompts from a newline-delimited text file

Shell Safety (action-runner)

Blocklist for hanging commands: expo start, npm install, yarn, pnpm, node, react-native run — throws before execution with a clear message
90-second hard timeout on any shell action via Promise.race()
Toggle via blockHangingCommands setting

Stream + Context Optimisations (api.chat)

Extended stream timeout: 180 000 ms for local models (isLocalModel flag)
Context optimisation re-enabled for local models with graceful fallback: LLM pre-pass wrapped in withTimeout(); keyword matching fallback on timeout
Token limit clamped to model's reported max (fixes Qwen 8 000 token warning)

Main-Thread Performance (Chat.client)

4 sequential setMessages() calls in onFinish collapsed into one, wrapped in startTransition() — eliminates "Page Unresponsive" dialogs during queues
storeMessageHistory deferred to requestIdleCallback (setTimeout fallback) so IndexedDB writes never block the render thread
Resource-exhaustion errors (ERR_INSUFFICIENT_RESOURCES, Failed to fetch) demoted from toast flood to console.warn

Slim Prompt (prompts/slim)

Minimal system prompt for 7B–13B models; hard_rules block first with DO NOT run install/server commands, DO NOT truncate files, DO NOT emit code outside boltArtifact — then concise format and behaviour rules
Registered in PromptLibrary as "Slim Prompt (local models)"

Artifact Render-Loop Fix

computed() store moved from inline JSX into useRef — fixes "Maximum update depth exceeded" on every workbenchStore update

ZIP Import (.bolt/prompt)

Detects .bolt/prompt inside imported ZIPs; surfaces content as the auto-fill textarea value so projects can ship a first-message template

Misc

useChatHistory snapshot save failure: toast → console.warn
Token limit clamped in stream-text.ts (model.maxTokenAllowed guard)

Summary

Adds a complete, validated workflow for running local small models (Qwen, Mistral, etc.) through multi-step project builds without browser lock-up or WebContainer overwhelm. Tested end-to-end with Qwen 2.5-coder:7b via Ollama against a 21-prompt React Native / Expo build queue — clean completion, responsive UI throughout.

What's included

Local LLM Settings Panel (LocalLLMPanel) — new flyout above the chat input with per-session toggles: extended stream timeout, shell command blocking, slim system prompt, dedup file writes, strip old prose, token budget. All persisted to localStorage.

Prompt Queue (PromptQueuePanel) — paste or load N prompts, run them unattended with a live progress bar. Uses adaptive delay between prompts: polls the action-runner until all file writes/shell commands reach a terminal state before firing the next one, preventing WebContainer overwhelm.

Shell safety — blocklist in action-runner.ts for commands that hang WebContainer (expo start, npm install, yarn, pnpm, node, react-native run). 90-second hard timeout on all shell actions. Toggle in the settings panel.

Stream + context optimisations — 180 s stream timeout for local models; context optimisation re-enabled for local models with timeout wrapper + keyword-matching fallback; token limit clamped to model's reported max (fixes Qwen 8 000 token warning).

Main-thread performance — 4 sequential setMessages() calls in onFinish collapsed into one startTransition()-wrapped call, eliminating "Page Unresponsive" dialogs during long queues. storeMessageHistory deferred to requestIdleCallback so IndexedDB writes never block rendering.

Slim prompt — minimal system prompt for 7B–13B models with hard_rules block first. Registered in PromptLibrary as "Slim Prompt (local models)".

Artifact render-loop fix — computed() store moved from inline JSX to useRef, fixing "Maximum update depth exceeded" on every workbenchStore update.

ZIP import .bolt/prompt — detects a .bolt/prompt file inside imported ZIPs and surfaces its content as the auto-fill textarea value, letting projects ship their own first-message template.

Test plan

Enable Ollama in provider settings, load a local model
Open Local LLM panel — verify all toggles persist across page reload
Import a ZIP with a .bolt/prompt file — textarea should pre-fill
Queue 5+ prompts via PromptQueuePanel, run to completion — UI stays responsive
Verify shell commands like npm install are blocked with a clear error message
Send a chat with a non-local provider — verify nothing regressed

Adds a complete workflow for running local small models (Qwen, Mistral, etc.) through multi-step project builds without browser lock-up or WebContainer overwhelm. Validated end-to-end with Qwen 2.5-coder:7b via Ollama against a 21-prompt React Native / Expo build queue. ## Local LLM Settings Panel (new: LocalLLMPanel) - Per-session toggles: extended stream timeout (3 min vs 45 s), shell command blocking, slim system prompt, dedup file writes, strip old prose, token budget - Persisted to localStorage via nanostores map (localLLMSettingsStore) - Flyout panel anchored above the chat input bar with dynamic max-height ## Prompt Queue (PromptQueuePanel) - Run N queued prompts end-to-end unattended; live progress bar - Adaptive delay between prompts: polls action-runner until all file writes / shell commands reach a terminal state before firing the next prompt — prevents WebContainer being overwhelmed by rapid-fire writes - Paste or load prompts from a newline-delimited text file ## Shell Safety (action-runner) - Blocklist for hanging commands: expo start, npm install, yarn, pnpm, node, react-native run — throws before execution with a clear message - 90-second hard timeout on any shell action via Promise.race() - Toggle via blockHangingCommands setting ## Stream + Context Optimisations (api.chat) - Extended stream timeout: 180 000 ms for local models (isLocalModel flag) - Context optimisation re-enabled for local models with graceful fallback: LLM pre-pass wrapped in withTimeout(); keyword matching fallback on timeout - Token limit clamped to model's reported max (fixes Qwen 8 000 token warning) ## Main-Thread Performance (Chat.client) - 4 sequential setMessages() calls in onFinish collapsed into one, wrapped in startTransition() — eliminates "Page Unresponsive" dialogs during queues - storeMessageHistory deferred to requestIdleCallback (setTimeout fallback) so IndexedDB writes never block the render thread - Resource-exhaustion errors (ERR_INSUFFICIENT_RESOURCES, Failed to fetch) demoted from toast flood to console.warn ## Slim Prompt (prompts/slim) - Minimal system prompt for 7B–13B models; hard_rules block first with DO NOT run install/server commands, DO NOT truncate files, DO NOT emit code outside boltArtifact — then concise format and behaviour rules - Registered in PromptLibrary as "Slim Prompt (local models)" ## Artifact Render-Loop Fix - computed() store moved from inline JSX into useRef — fixes "Maximum update depth exceeded" on every workbenchStore update ## ZIP Import (.bolt/prompt) - Detects .bolt/prompt inside imported ZIPs; surfaces content as the auto-fill textarea value so projects can ship a first-message template ## Misc - useChatHistory snapshot save failure: toast → console.warn - Token limit clamped in stream-text.ts (model.maxTokenAllowed guard) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(local-llm): prompt queue, shell safety, and main-thread performance#2156

feat(local-llm): prompt queue, shell safety, and main-thread performance#2156
FrostyPhoenix2 wants to merge 1 commit intostackblitz-labs:mainfrom
FrostyPhoenix2:feat/local-llm-queue-optimizations

FrostyPhoenix2 commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FrostyPhoenix2 commented Apr 16, 2026

Local LLM Settings Panel (new: LocalLLMPanel)

Prompt Queue (PromptQueuePanel)

Shell Safety (action-runner)

Stream + Context Optimisations (api.chat)

Main-Thread Performance (Chat.client)

Slim Prompt (prompts/slim)

Artifact Render-Loop Fix

ZIP Import (.bolt/prompt)

Misc

Summary

What's included

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants