Moss TTS Persian is realy Bad and i know the reason

Persian is one of those languages for which it is incredibly difficult to develop a TTS (Text-to-Speech) system. While Arabic utilizes diacritics like َ ُ ِ, Persian typically does not. This often causes words to be mispronounced with the wrong A, E, or O sounds when a model attempts to read them.

Advanced models like Gemini can usually infer the correct diacritics by analyzing the sentence context, but even they occasionally make mistakes. When that happens, I manually add diacritics to the "broken" words and regenerate the audio.

To solve this, I came up with the idea of using transliteration (writing Persian with English letters). This approach actually worked well with F5-TTS, but manually transliterating the text every time is tedious. To automate this, I trained a Gemma model to do it for me, as Gemma and Gemini currently have the most extensive knowledge of the Persian language. But Still there are mistakes since Gemma is not as good as Gemini Pro.

Unfortunately, I don't have enough resources to develop this further, as the current living conditions in Iran are very difficult, and I am personally affected by these challenges.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moss TTS Persian is realy Bad and i know the reason #47

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Moss TTS Persian is realy Bad and i know the reason #47

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions