Skip to content

Is it normal for it to be this slow? #41

@cochese9000

Description

@cochese9000

Hi there, I'm really enjoying testing your work, it's very good indeed!

I'm running it on windows 11 with flash attn 2 and triton installed.

Python 3.12.12
Build cuda_12.6.r12.6/compiler.34431801_0

I'm trying to use the clis/moss_tts_app.py Gradio UI.

But I'm finding it very slow to use on a 13900k with a 3090rtx. I'm getting around 1 to 5 it/s and it slows down on longer generations. Even a single sentence takes around 1 minute to generate. When I add a paragraph of text, it is several minutes (2-4 minutes depending on paragraph size).

The results are very good, with emotion being realistic and not nonotonous. But the latency is surprisingly high. Is this normal?

The real-time test is much quicker, but the quality is highly variable. Using moss_tts_realtime/app.y I can generate in a few seconds, but the quality is not useable in my case.

Just for reference, here is a generation using moss_tts_app.py with a single sentence.

Generating bs1 ...:  17%|████████████████████▋                   | 177/1024 [00:30<02:24,  5.86it/s]

^^ (shortened the line so it fits in git text window)

This will slow down as I add more sentences. I have a 13700k with RTX3090 and 64GB of RAM. The VRAM does get close to full when inference is running.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions