Hello. First thank you for this great project!
I would like to confirm if you trained the vosk-model-pt-fb-v0.1.1-20220516_2113 recently published on VOSK site: https://alphacephei.com/vosk/models. And also ask if you also trained the vosk-model-small-pt-0.3 small model also available there.
We recently made some very informal and subjective testing with real world audios, protected by privacy laws unfortunately, manually listening to them and comparing to the transcription results of both models using vosk-0.3.32 java library. Seems to us maybe the large model is giving worse results than the small model for some audios, returning more inexistent words in speech. Maybe the large model could have a large bias than the small one towards the data set used for training, generalizing worse to new audios, just a hypothesis...
If you also trained the older small model (?), was the data set used for training the same as the one used to train the large model? If not, I guess the newer large model used a larger data set? If yes, are you planning to train a new small model using the same data set used to train the large model?
I'm asking just to avoid duplicate efforts, because if the data sets used for training were different and if the large model used a larger data set, maybe I'll try to train a new small model using the last data set (public available right?)
Thank you very much for your attention!
Hello. First thank you for this great project!
I would like to confirm if you trained the vosk-model-pt-fb-v0.1.1-20220516_2113 recently published on VOSK site: https://alphacephei.com/vosk/models. And also ask if you also trained the vosk-model-small-pt-0.3 small model also available there.
We recently made some very informal and subjective testing with real world audios, protected by privacy laws unfortunately, manually listening to them and comparing to the transcription results of both models using vosk-0.3.32 java library. Seems to us maybe the large model is giving worse results than the small model for some audios, returning more inexistent words in speech. Maybe the large model could have a large bias than the small one towards the data set used for training, generalizing worse to new audios, just a hypothesis...
If you also trained the older small model (?), was the data set used for training the same as the one used to train the large model? If not, I guess the newer large model used a larger data set? If yes, are you planning to train a new small model using the same data set used to train the large model?
I'm asking just to avoid duplicate efforts, because if the data sets used for training were different and if the large model used a larger data set, maybe I'll try to train a new small model using the last data set (public available right?)
Thank you very much for your attention!