feat: expose native Whisper decoding options#3717
Conversation
|
✅ DCO Check Passed Thanks @sriharan0804, all your commits are properly signed off. 🎉 |
Merge Protections🟢 Merge protection satisfied — ready to merge. Show 1 satisfied protection🟢 Enforce conventional commitMake sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
Signed-off-by: sriharan2005@Tamil-- <sriharan0804@users.noreply.github.com>
ced7da1 to
16898c5
Compare
Signed-off-by: sriharan2005@Tamil-- <sriharan0804@users.noreply.github.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
| ) | ||
| ), | ||
| ] = None | ||
| condition_on_previous_text: Annotated[ |
There was a problem hiding this comment.
condition_on_previous_text: Annotated[bool, …] = None — a type/default mismatch (annotated bool, defaulted None). Because it's always forwarded, whisper receives None, which is falsy, so conditioning is effectively OFF by default. The field doc says "When unset, Whisper uses its default" — that's incorrect: passing None is not the same as omitting the argument; whisper's real default is True.
| self.language = asr_options.language | ||
| self.beam_size = asr_options.beam_size | ||
| self.condition_on_previous_text = asr_options.condition_on_previous_text | ||
| self.temperature = asr_options.temperature |
There was a problem hiding this comment.
Love forwarding more than just the two that I did...right choice IMHO.
2d614d9 to
24e478f
Compare
|
@BBC-Esq I'm glad you liked forwarding language and temperature as well. I also agree with your point about condition_on_previous_text—using None while always forwarding the argument doesn't preserve Whisper's default behavior. I've updated the default to True to match Whisper's default while still allowing users to explicitly set it to False. |
|
If this PR is meant to resolve #3703, the beam_size default should be 1, not None. beam_size=None selects whisper's GreedyDecoder (decoding.py:551) — the greedy decode path prone to the long-form repetition loop the issue describes — while any non-None value routes through BeamSearchDecoder instead (decoding.py:546-547). As written, the default ships the same greedy behavior the issue reports... |
IMHO, it should set it to "false" by default, not the openai-whisper library's "True" due to the spiraling issue, but then allow a user to set it to "True" if they really wanted to. |
|
@BBC-Esq Thanks for the additional feedback. That makes sense. My initial goal was to expose the native Whisper decoding options while keeping the defaults backward-compatible with the current behavior. I agree that using I'm happy to update the defaults if the maintainers would prefer the PR to change the default behavior rather than simply expose the options. |
Signed-off-by: sriharan2005@Tamil-- <sriharan0804@users.noreply.github.com>
24e478f to
6a1d6b0
Compare
|
@BBC-Esq I've addressed the requested changes:
Please let me know if you'd like any further changes. |
LGTM! |
Summary
This PR exposes additional decoding options for the native Whisper ASR backend and forwards them to
whisper.transcribe().Changes
beam_sizetoInlineAsrNativeWhisperOptions.condition_on_previous_texttoInlineAsrNativeWhisperOptions.transcribe()call in_NativeWhisperModel.This allows users to configure Whisper's decoding behavior through Docling instead of relying on Whisper's internal defaults.
Closes #3703.