feat: expose native Whisper decoding options by sriharan0804 · Pull Request #3717 · docling-project/docling

sriharan0804 · 2026-06-27T05:53:13Z

Summary

This PR exposes additional decoding options for the native Whisper ASR backend and forwards them to whisper.transcribe().

Changes

Added beam_size to InlineAsrNativeWhisperOptions.
Added condition_on_previous_text to InlineAsrNativeWhisperOptions.
Forwarded both options to the native Whisper transcribe() call in _NativeWhisperModel.

This allows users to configure Whisper's decoding behavior through Docling instead of relying on Whisper's internal defaults.

Closes #3703.

github-actions · 2026-06-27T05:53:22Z

✅ DCO Check Passed

Thanks @sriharan0804, all your commits are properly signed off. 🎉

mergify · 2026-06-27T05:53:48Z

Merge Protections

🟢 Merge protection satisfied — ready to merge.

Show 1 satisfied protection

🟢 Enforce conventional commit

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

Signed-off-by: sriharan2005@Tamil-- <sriharan0804@users.noreply.github.com>

PeterStaar-IBM

lgtm!

codecov · 2026-06-27T11:08:15Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

BBC-Esq · 2026-06-27T13:09:16Z

+            )
+        ),
+    ] = None
+    condition_on_previous_text: Annotated[


condition_on_previous_text: Annotated[bool, …] = None — a type/default mismatch (annotated bool, defaulted None). Because it's always forwarded, whisper receives None, which is falsy, so conditioning is effectively OFF by default. The field doc says "When unset, Whisper uses its default" — that's incorrect: passing None is not the same as omitting the argument; whisper's real default is True.

BBC-Esq · 2026-06-27T13:10:34Z

+            self.language = asr_options.language
+            self.beam_size = asr_options.beam_size
+            self.condition_on_previous_text = asr_options.condition_on_previous_text
+            self.temperature = asr_options.temperature


Love forwarding more than just the two that I did...right choice IMHO.

sriharan0804 · 2026-06-27T13:39:31Z

@BBC-Esq
Thanks for the review and the feedback!

I'm glad you liked forwarding language and temperature as well. I also agree with your point about condition_on_previous_text—using None while always forwarding the argument doesn't preserve Whisper's default behavior.

I've updated the default to True to match Whisper's default while still allowing users to explicitly set it to False.

BBC-Esq · 2026-06-27T13:53:06Z

If this PR is meant to resolve #3703, the beam_size default should be 1, not None. beam_size=None selects whisper's GreedyDecoder (decoding.py:551) — the greedy decode path prone to the long-form repetition loop the issue describes — while any non-None value routes through BeamSearchDecoder instead (decoding.py:546-547). As written, the default ships the same greedy behavior the issue reports...

BBC-Esq · 2026-06-27T13:55:37Z

@BBC-Esq Thanks for the review and the feedback!

I'm glad you liked forwarding language and temperature as well. I also agree with your point about condition_on_previous_text—using None while always forwarding the argument doesn't preserve Whisper's default behavior.

I've updated the default to True to match Whisper's default while still allowing users to explicitly set it to False.

IMHO, it should set it to "false" by default, not the openai-whisper library's "True" due to the spiraling issue, but then allow a user to set it to "True" if they really wanted to.

sriharan0804 · 2026-06-27T14:37:18Z

@BBC-Esq Thanks for the additional feedback. That makes sense.

My initial goal was to expose the native Whisper decoding options while keeping the defaults backward-compatible with the current behavior. I agree that using beam_size=1 and condition_on_previous_text=False could provide a better out-of-the-box experience for the long-form repetition issue described in #3703.

I'm happy to update the defaults if the maintainers would prefer the PR to change the default behavior rather than simply expose the options.

Signed-off-by: sriharan2005@Tamil-- <sriharan0804@users.noreply.github.com>

sriharan0804 · 2026-06-27T14:57:16Z

@BBC-Esq
Thanks for the detailed review!

I've addressed the requested changes:

Updated condition_on_previous_text to avoid the None default.
Changed the defaults to beam_size=1 and condition_on_previous_text=False to better address the long-form decoding issue discussed in [feat] Native Whisper: greedy decode default hangs on long-form audio - expose beam_size / condition_on_previous_text / temperature #3703.
Updated the option descriptions to reflect the new defaults while continuing to forward language, beam_size, condition_on_previous_text, and temperature to the native Whisper backend.

Please let me know if you'd like any further changes.

BBC-Esq · 2026-06-28T15:22:40Z

@BBC-Esq Thanks for the detailed review!

I've addressed the requested changes:

* Updated `condition_on_previous_text` to avoid the `None` default.

* Changed the defaults to `beam_size=1` and `condition_on_previous_text=False` to better address the long-form decoding issue discussed in [[feat] Native Whisper: greedy decode default hangs on long-form audio - expose beam_size / condition_on_previous_text / temperature #3703](https://github.com/docling-project/docling/issues/3703).

* Updated the option descriptions to reflect the new defaults while continuing to forward `language`, `beam_size`, `condition_on_previous_text`, and `temperature` to the native Whisper backend.

Please let me know if you'd like any further changes.

LGTM!

feat: expose native whisper decoding options

16898c5

Signed-off-by: sriharan2005@Tamil-- <sriharan0804@users.noreply.github.com>

sriharan0804 force-pushed the feat-native-whisper-options branch from ced7da1 to 16898c5 Compare June 27, 2026 06:36

test: cover native whisper decode options

e0e5a71

Signed-off-by: sriharan2005@Tamil-- <sriharan0804@users.noreply.github.com>

PeterStaar-IBM requested review from PeterStaar-IBM and ceberam June 27, 2026 11:04

PeterStaar-IBM previously approved these changes Jun 27, 2026

View reviewed changes

BBC-Esq reviewed Jun 27, 2026

View reviewed changes

sriharan0804 force-pushed the feat-native-whisper-options branch from 2d614d9 to 24e478f Compare June 27, 2026 13:35

feat: forward native whisper language and temperature

6a1d6b0

Signed-off-by: sriharan2005@Tamil-- <sriharan0804@users.noreply.github.com>

sriharan0804 dismissed PeterStaar-IBM’s stale review via 6a1d6b0 June 27, 2026 14:53

sriharan0804 force-pushed the feat-native-whisper-options branch from 24e478f to 6a1d6b0 Compare June 27, 2026 14:53

sriharan0804 requested a review from PeterStaar-IBM June 27, 2026 15:14

Uh oh!

Conversation

sriharan0804 commented Jun 27, 2026

Summary

Changes

Uh oh!

github-actions Bot commented Jun 27, 2026

Uh oh!

mergify Bot commented Jun 27, 2026

Merge Protections

🟢 Enforce conventional commit

Uh oh!

PeterStaar-IBM left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 27, 2026

Codecov Report

Uh oh!

BBC-Esq Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

BBC-Esq Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

sriharan0804 commented Jun 27, 2026

Uh oh!

BBC-Esq commented Jun 27, 2026

Uh oh!

BBC-Esq commented Jun 27, 2026

Uh oh!

sriharan0804 commented Jun 27, 2026

Uh oh!

sriharan0804 commented Jun 27, 2026

Uh oh!

BBC-Esq commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants