Skip to content

fix(streaming): defer prompt/blend/timbre conditioning changes to before_tick#237

Draft
gioelecerati wants to merge 1 commit into
mainfrom
gio/fix/defer-conditioning-to-before-tick
Draft

fix(streaming): defer prompt/blend/timbre conditioning changes to before_tick#237
gioelecerati wants to merge 1 commit into
mainfrom
gio/fix/defer-conditioning-to-before-tick

Conversation

@gioelecerati

Copy link
Copy Markdown
Collaborator

Draft — untested on GPU. Proposes a fix for a reproducible live-edit corruption; wants review + a :dev-pod validation before any :warm bake.

Symptom

While streaming, changing a conditioning input live — editing the prompt/tags, dragging the blend slider, or moving timbre strength — drops the generated output to silence in chunks. The silence persists (even at denoise/STRENGTH 0, which then reconstructs the corrupted latents instead of the source) until the client disconnects + reconnects, which re-encodes a clean buffer. A single tag edit can trigger it; a fast blend drag does it reliably. (Surfaced while testing the rtmg-vst plugin, but it's pod-side and not VST-specific.)

Root cause

The session already has a safe-boundary mechanism: apply_pending() is called by the runner from before_tick, and LoRA / source-swap / depth changes are staged into state.pending_* and applied there, serialized with the generation step.

But conditioning changes bypass it. set_prompt, set_prompt_blend, and set_timbre_strength ran _refresh_conditioning() — and set_prompt even re-ran encode_cond_pair (a GPU text encode) — immediately, on the command thread, mid generation step. Swapping stream.conditioning (or hogging the GPU with an encode) under the running step corrupts the in-flight audio chunk, which then sticks in the looping latent buffer as a silent section. Each change adds another; reconnect re-encodes a clean buffer, which is why it recovers.

Fix

Route the three live conditioning setters through the same pending drain as swaps/lora/depth:

  • set_prompt / set_prompt_blend / set_timbre_strength now just stage state.pending_{prompt,prompt_blend,timbre_strength} under state._lock (no encode, no recompose on the command thread). EXTERNAL set_prompt_blend still echoes immediately.
  • New _apply_conditioning_pending() (called from apply_pendingbefore_tick) drains them, does the encode (for prompt) + sets blend/timbre, and calls _refresh_conditioning() once, between steps. Latest staged value wins per tick, so a fast blend drag coalesces to one recompose per step.

Only ws_adapter.py (live client messages) calls these three, so nothing on the init/connect path is affected (that conditioning is set up via the swap path, which already runs in before_tick).

Open questions / review notes

  • Untested on GPU. Needs a :dev-pod run: edit tags / drag blend / move timbre while streaming and confirm no silence.
  • The prompt re-encode now runs in before_tick, briefly blocking the next step (intended — that's the serialization). Blend/timbre recompose is a cheap slerp/lerp, negligible.
  • PromptApplied now publishes at apply time (one tick later); PromptBlendEcho (EXTERNAL) is unchanged/immediate.
  • Telemetry (snapshot) reflects blend/timbre one tick late — harmless.

🤖 Generated with Claude Code

…ore_tick

Live set_prompt / set_prompt_blend / set_timbre_strength ran the encode +
_refresh_conditioning immediately on the command thread, so a change landing
mid generation step swapped stream.conditioning (and hogged the GPU with an
encode) under the running step. That corrupts the in-flight audio chunk, which
then sticks in the looping latent buffer as a silent section until the client
reconnects and re-encodes a clean buffer. A single tag edit or a fast
blend-slider drag reproduces it (surfaced via the rtmg-vst plugin).

Route these through the same pending-drain mechanism that swaps / lora / depth
already use: the set_* handlers now stage state.pending_{prompt,prompt_blend,
timbre_strength}, and apply_pending() drains them in before_tick so the encode
and conditioning recompose serialize with the generation step. Latest staged
value wins per tick; one _refresh_conditioning covers all three.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant