Cross fade streaming#187
Conversation
|
This is pretty rad, nice work! Will give it a test with some cloning on my end – sounds good from your sample, though. |
c73ea01 to
18bee34
Compare
|
Works very well - first chunk locally (3090) with the hybrid model (torch 2.4.1. .need to upgrade) was ~330ms at size 15 in the schedule. 5sec total gen time for an 8 sec clip, so definitely real time. |
|
I’ve added some prints to calibrate chunk_schedule for your system so after you get the first chunk you won’t have to wait for the second one, make sure it generates a continuous stream! |
|
actually just tested on a rtx3090 with cuda 12.6 (560.35.03 ) and torch 2.5.1 and got even lower latency and it occupied 4686MiB VRAM |
|
tested it on a RTX 3090 with torch 2.6.0 and cuda 12.4 On the transformer version it's also around ~250ms for the first chunk, but the hybrid one is a lot slower at like 600-700ms |
I'm running this on an H100 and am getting: |
|
This is great! Can I test this on a 1080ti? or are only 3000 series and above supported? |
|
@jhaArnav pay attention to the fact that 166 + 23 < 276ms, so you're effectively waiting for the 2nd chunk now rather than the 1st and I believe if you bump up 1st chunk size to have a continous stream, you'd be somewhere around the same 220-240ish ms latency |
|
@mrdrprofuroboros Thanks! I was severely underutilizing my GPU. kept messing around with chunk size and buffers and realized that 128 is a good size |
|
closing in favor of #208 |
I think I've solved the clicking issue while streaming with some overlap and cross-fade
Thanks @uetuluk for the initial implementation!
I've taken #49 as a base
Sorry, I didn't make gradio working with the format changes I introduced, hope you'd be able to update it easily
I've been testing it on RTX4090 and it looks pretty real-time to me now with ~260ms TTFB (~165 it/s)
Here is an audio sample generated in with it:
streaming.mp4