feat(local-llm): prompt queue, shell safety, and main-thread performance#2156
Open
FrostyPhoenix2 wants to merge 1 commit intostackblitz-labs:mainfrom
Open
feat(local-llm): prompt queue, shell safety, and main-thread performance#2156FrostyPhoenix2 wants to merge 1 commit intostackblitz-labs:mainfrom
FrostyPhoenix2 wants to merge 1 commit intostackblitz-labs:mainfrom
Conversation
Adds a complete workflow for running local small models (Qwen, Mistral, etc.) through multi-step project builds without browser lock-up or WebContainer overwhelm. Validated end-to-end with Qwen 2.5-coder:7b via Ollama against a 21-prompt React Native / Expo build queue. ## Local LLM Settings Panel (new: LocalLLMPanel) - Per-session toggles: extended stream timeout (3 min vs 45 s), shell command blocking, slim system prompt, dedup file writes, strip old prose, token budget - Persisted to localStorage via nanostores map (localLLMSettingsStore) - Flyout panel anchored above the chat input bar with dynamic max-height ## Prompt Queue (PromptQueuePanel) - Run N queued prompts end-to-end unattended; live progress bar - Adaptive delay between prompts: polls action-runner until all file writes / shell commands reach a terminal state before firing the next prompt — prevents WebContainer being overwhelmed by rapid-fire writes - Paste or load prompts from a newline-delimited text file ## Shell Safety (action-runner) - Blocklist for hanging commands: expo start, npm install, yarn, pnpm, node, react-native run — throws before execution with a clear message - 90-second hard timeout on any shell action via Promise.race() - Toggle via blockHangingCommands setting ## Stream + Context Optimisations (api.chat) - Extended stream timeout: 180 000 ms for local models (isLocalModel flag) - Context optimisation re-enabled for local models with graceful fallback: LLM pre-pass wrapped in withTimeout(); keyword matching fallback on timeout - Token limit clamped to model's reported max (fixes Qwen 8 000 token warning) ## Main-Thread Performance (Chat.client) - 4 sequential setMessages() calls in onFinish collapsed into one, wrapped in startTransition() — eliminates "Page Unresponsive" dialogs during queues - storeMessageHistory deferred to requestIdleCallback (setTimeout fallback) so IndexedDB writes never block the render thread - Resource-exhaustion errors (ERR_INSUFFICIENT_RESOURCES, Failed to fetch) demoted from toast flood to console.warn ## Slim Prompt (prompts/slim) - Minimal system prompt for 7B–13B models; hard_rules block first with DO NOT run install/server commands, DO NOT truncate files, DO NOT emit code outside boltArtifact — then concise format and behaviour rules - Registered in PromptLibrary as "Slim Prompt (local models)" ## Artifact Render-Loop Fix - computed() store moved from inline JSX into useRef — fixes "Maximum update depth exceeded" on every workbenchStore update ## ZIP Import (.bolt/prompt) - Detects .bolt/prompt inside imported ZIPs; surfaces content as the auto-fill textarea value so projects can ship a first-message template ## Misc - useChatHistory snapshot save failure: toast → console.warn - Token limit clamped in stream-text.ts (model.maxTokenAllowed guard) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a complete workflow for running local small models (Qwen, Mistral, etc.) through multi-step project builds without browser lock-up or WebContainer overwhelm. Validated end-to-end with Qwen 2.5-coder:7b via Ollama against a 21-prompt React Native / Expo build queue.
Local LLM Settings Panel (new: LocalLLMPanel)
Prompt Queue (PromptQueuePanel)
Shell Safety (action-runner)
Stream + Context Optimisations (api.chat)
Main-Thread Performance (Chat.client)
Slim Prompt (prompts/slim)
Artifact Render-Loop Fix
ZIP Import (.bolt/prompt)
Misc
Summary
Adds a complete, validated workflow for running local small models (Qwen, Mistral, etc.) through multi-step project builds without browser lock-up or WebContainer overwhelm. Tested end-to-end with Qwen 2.5-coder:7b via Ollama against a 21-prompt React Native / Expo build queue — clean completion, responsive UI throughout.
What's included
Local LLM Settings Panel (
LocalLLMPanel) — new flyout above the chat input with per-session toggles: extended stream timeout, shell command blocking, slim system prompt, dedup file writes, strip old prose, token budget. All persisted to localStorage.Prompt Queue (
PromptQueuePanel) — paste or load N prompts, run them unattended with a live progress bar. Uses adaptive delay between prompts: polls the action-runner until all file writes/shell commands reach a terminal state before firing the next one, preventing WebContainer overwhelm.Shell safety — blocklist in
action-runner.tsfor commands that hang WebContainer (expo start,npm install,yarn,pnpm,node,react-native run). 90-second hard timeout on all shell actions. Toggle in the settings panel.Stream + context optimisations — 180 s stream timeout for local models; context optimisation re-enabled for local models with timeout wrapper + keyword-matching fallback; token limit clamped to model's reported max (fixes Qwen 8 000 token warning).
Main-thread performance — 4 sequential
setMessages()calls inonFinishcollapsed into onestartTransition()-wrapped call, eliminating "Page Unresponsive" dialogs during long queues.storeMessageHistorydeferred torequestIdleCallbackso IndexedDB writes never block rendering.Slim prompt — minimal system prompt for 7B–13B models with
hard_rulesblock first. Registered inPromptLibraryas "Slim Prompt (local models)".Artifact render-loop fix —
computed()store moved from inline JSX touseRef, fixing "Maximum update depth exceeded" on everyworkbenchStoreupdate.ZIP import
.bolt/prompt— detects a.bolt/promptfile inside imported ZIPs and surfaces its content as the auto-fill textarea value, letting projects ship their own first-message template.Test plan
.bolt/promptfile — textarea should pre-fillnpm installare blocked with a clear error message