Whisper quality validation in CI

I don't care how fast your bit crunched model can spit out ****** tokens.


> Predictable. We practiced accuracy-driven development where our internal testing infrastructure validates code and model commits on Whisper accuracy evaluation benchmarks comprising [librispeech](https://huggingface.co/datasets/librispeech_asr/viewer/all/test.clean) (~2.6k short audio clips, ~5 hours total) and [earnings22](https://huggingface.co/datasets/distil-whisper/earnings22) (~120 long audio clips, ~120 hours total) datasets. Results of periodic testing are published [here](https://huggingface.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit). This approach enables us to detect and mitigate quality-of-inference (more on this below) regressions due to code changes in WhisperKit as well as performance and functional regressions from lower levels of the software stack. This helps us improve time-to-detect and time-to-fix most issues with best-effort. Taking it a step further, we offer [customer-level SLA](https://aws.amazon.com/what-is/service-level-agreement/)s to detect and fix all issues within a maximum time period for specific model and device versions to developers or enterprises.


The above quote from [Argmax](https://www.takeargmax.com/blog/whisperkit) is spot on, we need to do exactly this.

I want a CI job added that runs all whisper variants and quantization formats across librispeech.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper quality validation in CI #216

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Whisper quality validation in CI #216

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions