-
Notifications
You must be signed in to change notification settings - Fork 73
Description
Hi there, I'm really enjoying testing your work, it's very good indeed!
I'm running it on windows 11 with flash attn 2 and triton installed.
Python 3.12.12
Build cuda_12.6.r12.6/compiler.34431801_0
I'm trying to use the clis/moss_tts_app.py Gradio UI.
But I'm finding it very slow to use on a 13900k with a 3090rtx. I'm getting around 1 to 5 it/s and it slows down on longer generations. Even a single sentence takes around 1 minute to generate. When I add a paragraph of text, it is several minutes (2-4 minutes depending on paragraph size).
The results are very good, with emotion being realistic and not nonotonous. But the latency is surprisingly high. Is this normal?
The real-time test is much quicker, but the quality is highly variable. Using moss_tts_realtime/app.y I can generate in a few seconds, but the quality is not useable in my case.
Just for reference, here is a generation using moss_tts_app.py with a single sentence.
Generating bs1 ...: 17%|████████████████████▋ | 177/1024 [00:30<02:24, 5.86it/s]
^^ (shortened the line so it fits in git text window)
This will slow down as I add more sentences. I have a 13700k with RTX3090 and 64GB of RAM. The VRAM does get close to full when inference is running.
Thanks!