-
Notifications
You must be signed in to change notification settings - Fork 73
Description
Persian is one of those languages for which it is incredibly difficult to develop a TTS (Text-to-Speech) system. While Arabic utilizes diacritics like َ ُ ِ, Persian typically does not. This often causes words to be mispronounced with the wrong A, E, or O sounds when a model attempts to read them.
Advanced models like Gemini can usually infer the correct diacritics by analyzing the sentence context, but even they occasionally make mistakes. When that happens, I manually add diacritics to the "broken" words and regenerate the audio.
To solve this, I came up with the idea of using transliteration (writing Persian with English letters). This approach actually worked well with F5-TTS, but manually transliterating the text every time is tedious. To automate this, I trained a Gemma model to do it for me, as Gemma and Gemini currently have the most extensive knowledge of the Persian language. But Still there are mistakes since Gemma is not as good as Gemini Pro.
Unfortunately, I don't have enough resources to develop this further, as the current living conditions in Iran are very difficult, and I am personally affected by these challenges.