Question about VOSK pt-BR public available trained models

Hello. First thank you for this great project!

I would like to confirm if you trained the vosk-model-pt-fb-v0.1.1-20220516_2113 recently published on VOSK site: https://alphacephei.com/vosk/models. And also ask if you also trained the vosk-model-small-pt-0.3 small model also available there.

We recently made some very **informal and subjective** testing with real world audios, protected by privacy laws unfortunately, manually listening to them and comparing to the transcription results of both models using vosk-0.3.32 java library. Seems to us maybe the large model is giving worse results than the small model for some audios, returning more inexistent words in speech. **Maybe** the large model could have a large bias than the small one towards the data set used for training, generalizing worse to new audios, just a hypothesis...

If you also trained the older small model (?), was the data set used for training the same as the one used to train the large model? If not, I guess the newer large model used a larger data set? If yes, are you planning to train a new small model using the same data set used to train the large model?

I'm asking just to avoid duplicate efforts, because if the data sets used for training were different and if the large model used a larger data set, maybe I'll try to train a new small model using the last data set (public available right?)

Thank you very much for your attention!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about VOSK pt-BR public available trained models #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about VOSK pt-BR public available trained models #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions