Skip to content

Cross fade streaming#187

Closed
mrdrprofuroboros wants to merge 6 commits into
Zyphra:mainfrom
mrdrprofuroboros:cross-fade-streaming
Closed

Cross fade streaming#187
mrdrprofuroboros wants to merge 6 commits into
Zyphra:mainfrom
mrdrprofuroboros:cross-fade-streaming

Conversation

@mrdrprofuroboros
Copy link
Copy Markdown

I think I've solved the clicking issue while streaming with some overlap and cross-fade

Thanks @uetuluk for the initial implementation!
I've taken #49 as a base

Sorry, I didn't make gradio working with the format changes I introduced, hope you'd be able to update it easily

I've been testing it on RTX4090 and it looks pretty real-time to me now with ~260ms TTFB (~165 it/s)

Starting streaming generation...
Received chunk 1: shape torch.Size([1, 5120]) [259ms]
Received chunk 2: shape torch.Size([1, 10240]) [501ms]
Received chunk 3: shape torch.Size([1, 20480]) [864ms]
Received chunk 4: shape torch.Size([1, 30720]) [1347ms]
Received chunk 5: shape torch.Size([1, 40960]) [1830ms]
Received chunk 6: shape torch.Size([1, 40960]) [2317ms]
Received chunk 7: shape torch.Size([1, 40960]) [2804ms]
Received chunk 8: shape torch.Size([1, 40960]) [3293ms]
Received chunk 9: shape torch.Size([1, 40960]) [3783ms]
Received chunk 10: shape torch.Size([1, 40960]) [4024ms]
Received chunk 11: shape torch.Size([1, 19456]) [4024ms]
Saved streaming audio to 'stream_improved_sample.wav' (sampling rate: 44100 Hz).

Here is an audio sample generated in with it:

streaming.mp4

@chris-calo
Copy link
Copy Markdown

This is pretty rad, nice work! Will give it a test with some cloning on my end – sounds good from your sample, though.

@dillonroach
Copy link
Copy Markdown

Works very well - first chunk locally (3090) with the hybrid model (torch 2.4.1. .need to upgrade) was ~330ms at size 15 in the schedule. 5sec total gen time for an 8 sec clip, so definitely real time.

@mrdrprofuroboros
Copy link
Copy Markdown
Author

I’ve added some prints to calibrate chunk_schedule for your system so after you get the first chunk you won’t have to wait for the second one, make sure it generates a continuous stream!
There’s definitely room for improvement there, I think I can make one more iteration on the PR in a while

@mrdrprofuroboros
Copy link
Copy Markdown
Author

actually just tested on a rtx3090 with cuda 12.6 (560.35.03 ) and torch 2.5.1 and got even lower latency

Starting streaming generation...
Received chunk 1: time 220ms | generated up to 301ms
Received chunk 2: time 299ms | generated up to 403ms
Received chunk 3: time 393ms | generated up to 509ms
Received chunk 4: time 501ms | generated up to 640ms
Received chunk 5: time 638ms | generated up to 801ms
...

and it occupied 4686MiB VRAM

@AWAS666
Copy link
Copy Markdown

AWAS666 commented Mar 19, 2025

tested it on a RTX 3090 with torch 2.6.0 and cuda 12.4

On the transformer version it's also around ~250ms for the first chunk, but the hybrid one is a lot slower at like 600-700ms
Both with the default schedule of the streaming sample

@jhaArnav
Copy link
Copy Markdown

Starting streaming generation...
Received chunk 1: time 220ms | generated up to 301ms
Received chunk 2: time 299ms | generated up to 403ms
Received chunk 3: time 393ms | generated up to 509ms
Received chunk 4: time 501ms | generated up to 640ms
Received chunk 5: time 638ms | generated up to 801ms
...

I'm running this on an H100 and am getting:
Sending chunk 1: time 166ms | generated 23ms of audio
Sending chunk 2: time 276ms | generated 174ms of audio
Sending chunk 3: time 412ms | generated 232ms of audio
Sending chunk 4: time 579ms | generated 290ms of audio
Sending chunk 5: time 768ms | generated 348ms of audio
Sending chunk 6: time 985ms | generated 406ms of audio
Sending chunk 7: time 1203ms | generated 464ms of audio
Sending chunk 8: time 1420ms | generated 464ms of audio
Sending chunk 9: time 1641ms | generated 464ms of audio

@constan1
Copy link
Copy Markdown

This is great!

Can I test this on a 1080ti? or are only 3000 series and above supported?

@mrdrprofuroboros
Copy link
Copy Markdown
Author

@jhaArnav pay attention to the fact that 166 + 23 < 276ms, so you're effectively waiting for the 2nd chunk now rather than the 1st and I believe if you bump up 1st chunk size to have a continous stream, you'd be somewhere around the same 220-240ish ms latency

@jhaArnav
Copy link
Copy Markdown

@mrdrprofuroboros Thanks! I was severely underutilizing my GPU. kept messing around with chunk size and buffers and realized that 128 is a good size

@mrdrprofuroboros
Copy link
Copy Markdown
Author

mrdrprofuroboros commented Apr 3, 2025

closing in favor of #208

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants