Is it normal for it to be this slow?

Hi there, I'm really enjoying testing your work, it's very good indeed!

I'm running it on windows 11 with flash attn 2 and triton installed.

Python 3.12.12
Build cuda_12.6.r12.6/compiler.34431801_0

I'm trying to use the clis/moss_tts_app.py Gradio UI.

But I'm finding it very slow to use on a 13900k with a 3090rtx.  I'm getting around 1 to 5 it/s and it slows down on longer generations.  Even a single sentence takes around 1 minute to generate.  When I add a paragraph of text, it is several minutes (2-4 minutes depending on paragraph size).

The results are very good, with emotion being realistic and not nonotonous.  But the latency is surprisingly high.  Is this normal?

The real-time test is much quicker, but the quality is highly variable.  Using moss_tts_realtime/app.y I can generate in a few seconds, but the quality is not useable in my case.  

Just for reference, here is a generation using moss_tts_app.py with a single sentence.

```
Generating bs1 ...:  17%|████████████████████▋                   | 177/1024 [00:30<02:24,  5.86it/s]
```
^^ (shortened the line so it fits in git text window)

This will slow down as I add more sentences.  I have a 13700k with RTX3090 and 64GB of RAM.  The VRAM does get close to full when inference is running.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it normal for it to be this slow? #41

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is it normal for it to be this slow? #41

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions