fix(streaming): defer prompt/blend/timbre conditioning changes to before_tick#237
Draft
gioelecerati wants to merge 1 commit into
Draft
fix(streaming): defer prompt/blend/timbre conditioning changes to before_tick#237gioelecerati wants to merge 1 commit into
gioelecerati wants to merge 1 commit into
Conversation
…ore_tick
Live set_prompt / set_prompt_blend / set_timbre_strength ran the encode +
_refresh_conditioning immediately on the command thread, so a change landing
mid generation step swapped stream.conditioning (and hogged the GPU with an
encode) under the running step. That corrupts the in-flight audio chunk, which
then sticks in the looping latent buffer as a silent section until the client
reconnects and re-encodes a clean buffer. A single tag edit or a fast
blend-slider drag reproduces it (surfaced via the rtmg-vst plugin).
Route these through the same pending-drain mechanism that swaps / lora / depth
already use: the set_* handlers now stage state.pending_{prompt,prompt_blend,
timbre_strength}, and apply_pending() drains them in before_tick so the encode
and conditioning recompose serialize with the generation step. Latest staged
value wins per tick; one _refresh_conditioning covers all three.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Symptom
While streaming, changing a conditioning input live — editing the prompt/tags, dragging the blend slider, or moving timbre strength — drops the generated output to silence in chunks. The silence persists (even at denoise/STRENGTH 0, which then reconstructs the corrupted latents instead of the source) until the client disconnects + reconnects, which re-encodes a clean buffer. A single tag edit can trigger it; a fast blend drag does it reliably. (Surfaced while testing the
rtmg-vstplugin, but it's pod-side and not VST-specific.)Root cause
The session already has a safe-boundary mechanism:
apply_pending()is called by the runner frombefore_tick, and LoRA / source-swap / depth changes are staged intostate.pending_*and applied there, serialized with the generation step.But conditioning changes bypass it.
set_prompt,set_prompt_blend, andset_timbre_strengthran_refresh_conditioning()— andset_prompteven re-ranencode_cond_pair(a GPU text encode) — immediately, on the command thread, mid generation step. Swappingstream.conditioning(or hogging the GPU with an encode) under the running step corrupts the in-flight audio chunk, which then sticks in the looping latent buffer as a silent section. Each change adds another; reconnect re-encodes a clean buffer, which is why it recovers.Fix
Route the three live conditioning setters through the same pending drain as swaps/lora/depth:
set_prompt/set_prompt_blend/set_timbre_strengthnow just stagestate.pending_{prompt,prompt_blend,timbre_strength}understate._lock(no encode, no recompose on the command thread).EXTERNALset_prompt_blendstill echoes immediately._apply_conditioning_pending()(called fromapply_pending→before_tick) drains them, does the encode (for prompt) + sets blend/timbre, and calls_refresh_conditioning()once, between steps. Latest staged value wins per tick, so a fast blend drag coalesces to one recompose per step.Only
ws_adapter.py(live client messages) calls these three, so nothing on the init/connect path is affected (that conditioning is set up via the swap path, which already runs inbefore_tick).Open questions / review notes
:dev-pod run: edit tags / drag blend / move timbre while streaming and confirm no silence.before_tick, briefly blocking the next step (intended — that's the serialization). Blend/timbre recompose is a cheap slerp/lerp, negligible.PromptAppliednow publishes at apply time (one tick later);PromptBlendEcho(EXTERNAL) is unchanged/immediate.snapshot) reflects blend/timbre one tick late — harmless.🤖 Generated with Claude Code