Skip to content

fix: use default whisper temperature values for temperature 0.0 like openai api#615

Open
dotmobo wants to merge 1 commit into
speaches-ai:masterfrom
dotmobo:feature/fix-temperature-default-value
Open

fix: use default whisper temperature values for temperature 0.0 like openai api#615
dotmobo wants to merge 1 commit into
speaches-ai:masterfrom
dotmobo:feature/fix-temperature-default-value

Conversation

@dotmobo
Copy link
Copy Markdown

@dotmobo dotmobo commented Feb 19, 2026

Hi,

It's an alternative version of #553 made for the latest v0.9.0 RC.

So, like the OpenAI API, the default temperature remains 0.0. But OpenAI made some temperature changes behind this, cf: https://developers.openai.com/api/reference/resources/audio/subresources/transcriptions/methods/create

If set to 0, the model will use log probability (https://en.wikipedia.org/wiki/Log_probability) to automatically increase the temperature until certain thresholds are hit.

Then we can see that the default Whisper temperature values are a list of temperatures, cf: https://whisper-api.com/docs/transcription-options/#sampling-temperature

The default temperature values are [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]

So, I think that OpenAI transforms the temperature 0.0 to [0.0, 0.2, 0.4, 0.6, 0.8, 1.0] to activate that "automatically increase the temperature until certain thresholds are hit" feature.

Here we are doing the same. So it will fix the bug in #553.

Regards,

Morgan

@dotmobo dotmobo force-pushed the feature/fix-temperature-default-value branch from ea0d2d1 to 14bd565 Compare February 19, 2026 11:11
@dotmobo dotmobo changed the title use default whisper temperature values for temperature 0.0 like openai api fix: use default whisper temperature values for temperature 0.0 like openai api Feb 19, 2026
@Benju1
Copy link
Copy Markdown

Benju1 commented Mar 3, 2026

+1 for this fix. We're running into the exact same issue with a German CT2 model (TheTobyB/whisper-large-v3-turbo-german-ct2).

Symptoms: A 1:21 minute audio recording is only transcribed up to ~30 seconds. The remaining segments are silently dropped.

Debug logs show:

Processing segment at 00:00.000           → OK
Processing segment at 00:30.000           → Log probability threshold is not met with temperature 0.0 (-1.614954 < -1.000000)
Processing segment at 01:00.000           → Log probability threshold is not met with temperature 0.0 (-1.994506 < -1.000000)

Because temperature is passed as a single float 0.0 (no fallback list), faster-whisper cannot retry with higher temperatures. Combined with no_speech_prob > no_speech_threshold, segments 2 and 3 are classified as silence and their text is completely dropped.

Workaround: We're currently using a sed-based entrypoint patch to set no_speech_threshold=None, log_prob_threshold=None, compression_ratio_threshold=None in the transcribe() calls, but the proper fix is this PR — converting temperature=0.0 to the default fallback list [0, 0.2, 0.4, 0.6, 0.8, 1.0] as OpenAI does.

This is a significant issue for non-English models where confidence scores are naturally lower. Would love to see this merged!

longregen added a commit to longregen/speaches that referenced this pull request Mar 5, 2026
- aarch64-darwin works
- debloat docs...
- merge dependencies that are not important in one file
- make it multilingual, faster_whisper handles it
- default to Python 3.12, but 3.13 and 3.14 also work
- Includes PR speaches-ai#609, speaches-ai#610, and speaches-ai#615
longregen added a commit to longregen/speaches that referenced this pull request Mar 5, 2026
- aarch64-darwin works
- debloat docs...
- merge dependencies that are not important in one file
- make it multilingual, faster_whisper handles it
- default to Python 3.12, but 3.13 and 3.14 also work
- Includes PR speaches-ai#609, speaches-ai#610, and speaches-ai#615
longregen added a commit to longregen/speaches that referenced this pull request Mar 5, 2026
- aarch64-darwin works
- debloat docs...
- merge dependencies that are not important in one file
- make it multilingual, faster_whisper handles it
- default to Python 3.12, but 3.13 and 3.14 also work
- Includes PR speaches-ai#609, speaches-ai#610, and speaches-ai#615
longregen added a commit to longregen/speaches that referenced this pull request Mar 5, 2026
- aarch64-darwin works
- debloat docs...
- merge dependencies that are not important in one file
- make it multilingual, faster_whisper handles it
- default to Python 3.12, but 3.13 and 3.14 also work
- Includes PR speaches-ai#609, speaches-ai#610, and speaches-ai#615
longregen added a commit to longregen/speaches that referenced this pull request Mar 5, 2026
- aarch64-darwin works
- debloat docs...
- merge dependencies that are not important in one file
- make it multilingual, faster_whisper handles it
- default to Python 3.12, but 3.13 and 3.14 also work
- Includes PR speaches-ai#609, speaches-ai#610, and speaches-ai#615
@Yichen-fqyd
Copy link
Copy Markdown
Contributor

Thanks for submitting this pr
Would this replicate the same behavior of faster_whisper when compression_ratio hit compression_ratio_threshold, it will try different temperatures to help with hallucination.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants