Dear Zonos creators,
just tested Zonos today and it seems it is around 10 to 100x slower than tts_models/en/ljspeech/vits.
-- same hardware -- RTX 4060ti
The good thing is it picks up from a sample wav voice given, but the slowness makes this advantage completely irrelevant.
Am I doing something wrong that it is so slow - or can this be speed up?
Dear Zonos creators,
just tested Zonos today and it seems it is around 10 to 100x slower than tts_models/en/ljspeech/vits.
-- same hardware -- RTX 4060ti
The good thing is it picks up from a sample wav voice given, but the slowness makes this advantage completely irrelevant.
Am I doing something wrong that it is so slow - or can this be speed up?