Cross fade streaming by mrdrprofuroboros · Pull Request #187 · Zyphra/Zonos

mrdrprofuroboros · 2025-03-10T22:43:43Z

I think I've solved the clicking issue while streaming with some overlap and cross-fade

Thanks @uetuluk for the initial implementation!
I've taken #49 as a base

Sorry, I didn't make gradio working with the format changes I introduced, hope you'd be able to update it easily

I've been testing it on RTX4090 and it looks pretty real-time to me now with ~260ms TTFB (~165 it/s)

Starting streaming generation...
Received chunk 1: shape torch.Size([1, 5120]) [259ms]
Received chunk 2: shape torch.Size([1, 10240]) [501ms]
Received chunk 3: shape torch.Size([1, 20480]) [864ms]
Received chunk 4: shape torch.Size([1, 30720]) [1347ms]
Received chunk 5: shape torch.Size([1, 40960]) [1830ms]
Received chunk 6: shape torch.Size([1, 40960]) [2317ms]
Received chunk 7: shape torch.Size([1, 40960]) [2804ms]
Received chunk 8: shape torch.Size([1, 40960]) [3293ms]
Received chunk 9: shape torch.Size([1, 40960]) [3783ms]
Received chunk 10: shape torch.Size([1, 40960]) [4024ms]
Received chunk 11: shape torch.Size([1, 19456]) [4024ms]
Saved streaming audio to 'stream_improved_sample.wav' (sampling rate: 44100 Hz).

Here is an audio sample generated in with it:

streaming.mp4

chris-calo · 2025-03-11T16:35:07Z

This is pretty rad, nice work! Will give it a test with some cloning on my end – sounds good from your sample, though.

dillonroach · 2025-03-15T06:42:51Z

Works very well - first chunk locally (3090) with the hybrid model (torch 2.4.1. .need to upgrade) was ~330ms at size 15 in the schedule. 5sec total gen time for an 8 sec clip, so definitely real time.

mrdrprofuroboros · 2025-03-15T20:06:51Z

I’ve added some prints to calibrate chunk_schedule for your system so after you get the first chunk you won’t have to wait for the second one, make sure it generates a continuous stream!
There’s definitely room for improvement there, I think I can make one more iteration on the PR in a while

mrdrprofuroboros · 2025-03-17T02:47:37Z

actually just tested on a rtx3090 with cuda 12.6 (560.35.03 ) and torch 2.5.1 and got even lower latency

Starting streaming generation...
Received chunk 1: time 220ms | generated up to 301ms
Received chunk 2: time 299ms | generated up to 403ms
Received chunk 3: time 393ms | generated up to 509ms
Received chunk 4: time 501ms | generated up to 640ms
Received chunk 5: time 638ms | generated up to 801ms
...

and it occupied 4686MiB VRAM

AWAS666 · 2025-03-19T17:04:10Z

tested it on a RTX 3090 with torch 2.6.0 and cuda 12.4

On the transformer version it's also around ~250ms for the first chunk, but the hybrid one is a lot slower at like 600-700ms
Both with the default schedule of the streaming sample

jhaArnav · 2025-03-20T10:09:59Z

Starting streaming generation...
Received chunk 1: time 220ms | generated up to 301ms
Received chunk 2: time 299ms | generated up to 403ms
Received chunk 3: time 393ms | generated up to 509ms
Received chunk 4: time 501ms | generated up to 640ms
Received chunk 5: time 638ms | generated up to 801ms
...

I'm running this on an H100 and am getting:
Sending chunk 1: time 166ms | generated 23ms of audio
Sending chunk 2: time 276ms | generated 174ms of audio
Sending chunk 3: time 412ms | generated 232ms of audio
Sending chunk 4: time 579ms | generated 290ms of audio
Sending chunk 5: time 768ms | generated 348ms of audio
Sending chunk 6: time 985ms | generated 406ms of audio
Sending chunk 7: time 1203ms | generated 464ms of audio
Sending chunk 8: time 1420ms | generated 464ms of audio
Sending chunk 9: time 1641ms | generated 464ms of audio

constan1 · 2025-03-21T17:35:03Z

This is great!

Can I test this on a 1080ti? or are only 3000 series and above supported?

mrdrprofuroboros · 2025-03-24T16:53:27Z

@jhaArnav pay attention to the fact that 166 + 23 < 276ms, so you're effectively waiting for the 2nd chunk now rather than the 1st and I believe if you bump up 1st chunk size to have a continous stream, you'd be somewhere around the same 220-240ish ms latency

jhaArnav · 2025-03-25T00:24:08Z

@mrdrprofuroboros Thanks! I was severely underutilizing my GPU. kept messing around with chunk size and buffers and realized that 128 is a good size

mrdrprofuroboros · 2025-04-03T05:36:25Z

closing in favor of #208

mrdrprofuroboros added 2 commits March 10, 2025 17:31

cross-fade streaming

00f312d

fade in first

18bee34

mrdrprofuroboros force-pushed the cross-fade-streaming branch from c73ea01 to 18bee34 Compare March 13, 2025 04:40

mrdrprofuroboros added 2 commits March 13, 2025 01:07

pre-compute fade in/out

0ebf9bb

optimal schedule

7758d41

mrdrprofuroboros added 2 commits March 24, 2025 21:00

loading speaker embedding model from local path

47e567b

loading speaker embedding model from local path

df81f57

mrdrprofuroboros mentioned this pull request Apr 3, 2025

Infinite streaming #208

Open

mrdrprofuroboros closed this Apr 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross fade streaming#187

Cross fade streaming#187
mrdrprofuroboros wants to merge 6 commits into
Zyphra:mainfrom
mrdrprofuroboros:cross-fade-streaming

mrdrprofuroboros commented Mar 10, 2025

Uh oh!

chris-calo commented Mar 11, 2025

Uh oh!

dillonroach commented Mar 15, 2025

Uh oh!

mrdrprofuroboros commented Mar 15, 2025

Uh oh!

mrdrprofuroboros commented Mar 17, 2025

Uh oh!

AWAS666 commented Mar 19, 2025

Uh oh!

jhaArnav commented Mar 20, 2025

Uh oh!

constan1 commented Mar 21, 2025

Uh oh!

mrdrprofuroboros commented Mar 24, 2025

Uh oh!

jhaArnav commented Mar 25, 2025

Uh oh!

mrdrprofuroboros commented Apr 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

mrdrprofuroboros commented Mar 10, 2025

Uh oh!

chris-calo commented Mar 11, 2025

Uh oh!

dillonroach commented Mar 15, 2025

Uh oh!

mrdrprofuroboros commented Mar 15, 2025

Uh oh!

mrdrprofuroboros commented Mar 17, 2025

Uh oh!

AWAS666 commented Mar 19, 2025

Uh oh!

jhaArnav commented Mar 20, 2025

Uh oh!

constan1 commented Mar 21, 2025

Uh oh!

mrdrprofuroboros commented Mar 24, 2025

Uh oh!

jhaArnav commented Mar 25, 2025

Uh oh!

mrdrprofuroboros commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mrdrprofuroboros commented Apr 3, 2025 •

edited

Loading