Skip to content

Moss TTS Persian is realy Bad and i know the reason #47

@lumos675

Description

@lumos675

Persian is one of those languages for which it is incredibly difficult to develop a TTS (Text-to-Speech) system. While Arabic utilizes diacritics like َ ُ ِ, Persian typically does not. This often causes words to be mispronounced with the wrong A, E, or O sounds when a model attempts to read them.

Advanced models like Gemini can usually infer the correct diacritics by analyzing the sentence context, but even they occasionally make mistakes. When that happens, I manually add diacritics to the "broken" words and regenerate the audio.

To solve this, I came up with the idea of using transliteration (writing Persian with English letters). This approach actually worked well with F5-TTS, but manually transliterating the text every time is tedious. To automate this, I trained a Gemma model to do it for me, as Gemma and Gemini currently have the most extensive knowledge of the Persian language. But Still there are mistakes since Gemma is not as good as Gemini Pro.

Unfortunately, I don't have enough resources to develop this further, as the current living conditions in Iran are very difficult, and I am personally affected by these challenges.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions