fix(engine): stabilize dictation capture and release paste by future3OOO · Pull Request #1 · future3OOO/Whisper-Smart

future3OOO · 2026-05-02T02:48:10Z

Summary

Prevent mouse-hold dictation from transcribing overlapping VAD batches before release.
Harden Windows microphone fallback while keeping explicit input devices fail-closed.
Add regression coverage for release paste duplication and resampled audio callback behavior.

Test plan

.venv310\Scripts\python.exe -m pytest tests/test_engine_advanced.py::TestAdvancedDictationEngine::test_flush_hold_does_not_duplicate_vad_tail tests/test_engine_advanced.py::TestAdvancedDictationEngine::test_hold_mode_does_not_transcribe_vad_batches_before_release -q
.venv310\Scripts\python.exe -m pytest tests/test_io.py -q
.venv310\Scripts\python.exe -m black --check dictation_tool/engine.py dictation_tool/io.py tests/test_engine_advanced.py tests/test_io.py
git diff --check

Summary by CodeRabbit

Documentation
- Updated latency specifications for medium-speed model (now 5–200 ms instead of 5–20 ms).
Improvements
- Enhanced audio device compatibility with automatic fallback when requested sample rate unavailable.
- Improved audio resampling for better cross-platform device support.
- Expanded Windows device enumeration options.

Prevent mouse-hold dictation from transcribing overlapping VAD batches before release, and harden Windows microphone fallback so default devices are portable while explicit devices fail closed. Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai · 2026-05-02T02:48:20Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 67dc6d77-bc63-4bdb-9865-835e0881173b

📥 Commits

Reviewing files that changed from the base of the PR and between 37b576c and 988adcb.

📒 Files selected for processing (5)

README.md
dictation_tool/engine.py
dictation_tool/io.py
tests/test_engine_advanced.py
tests/test_io.py

Walkthrough

This PR refactors the audio input pipeline by improving VADGate segment creation with dedicated helper methods, enhancing AudioStream with device fallback selection and automatic resampling in the callback, and adjusting engine VAD flush behavior to avoid duplicating tail samples. Tests validate device selection logic and VAD buffer behavior. Documentation updates latency expectations for the medium.en model.

Changes

Audio I/O & VAD Pipeline Refactoring

Layer / File(s)	Summary
Interface Updates `dictation_tool/io.py`	`AudioStream.__init__` now accepts `input_device: str \| int \| None` (previously `str \| None`), enabling numeric device IDs alongside names.
VADGate Segment Creation `dictation_tool/io.py`	`VADGate._process_frame` refactored to delegate segment construction to new `_create_segment()` and `_reset_for_next_utterance()` helper methods, centralizing buffer-to-array concatenation logic.
AudioStream Device & Resampling `dictation_tool/io.py`	`_open_stream` introduced with two-pass device opening (target rate first, fallback to native rate with callback resampling); `_all_input_devices` added for Windows WASAPI-first ordering; callback now resamples native-rate chunks to target rate via `np.interp` before queueing.
Engine VAD Flush Coordination `dictation_tool/engine.py`	`_flush_hold()` VAD path now calls `force_flush()` but discards its tail when `_raw_shadow` is non-empty; only appends tail when shadow is empty, preventing duplicate VAD samples.
Test Validation `tests/test_engine_advanced.py`, `tests/test_io.py`	New tests verify VAD tail deduplication, hold-mode transcription blocking, device fallback fail-closure, Windows default-device fallback, and callback resampling to target rate; VADGate parameter tests refactored to check `pre_buffer_chunks`/`post_buffer_chunks` instead of `padding_ms`.

Documentation Update

Layer / File(s)	Summary
README Latency Expectations `README.md`	"Maximum speed (medium.en + prompt tricks)" section updated from ~5–20 ms to ~5–200 ms interface latency.

Sequence Diagram

sequenceDiagram
    participant Client
    participant AudioStream
    participant Device Selection
    participant Native Device
    participant Resampler
    participant VADGate
    participant Engine

    Client->>AudioStream: open_stream(target_rate)
    AudioStream->>Device Selection: attempt target_rate device
    alt Device supports target rate
        Device Selection->>Native Device: open at target_rate
    else Fallback to native
        Device Selection->>Native Device: open at native_rate
    end
    
    Native Device->>Resampler: callback(native_rate_chunk)
    Resampler->>Resampler: np.interp if native ≠ target
    Resampler->>VADGate: _process_frame(target_rate_chunk)
    
    alt Speech detected
        VADGate->>VADGate: _create_segment()
        VADGate->>Engine: segment (no duplicate tail)
    else Silence buffered
        VADGate->>VADGate: accumulate in pre_buffer
    end
    
    Engine->>Engine: _flush_hold() with dedup logic

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A clearer path through the digital streams
Device fallbacks and resampling dreams,
VADGate speaks true without echoed tail,
Audio flow now won't ever fail! 🎙️✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 37.04% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the primary fix: stabilizing dictation capture and release in paste operations, which aligns directly with the main objectives of preventing VAD batch duplication and hardening microphone fallback.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/dictation-release-paste

_{Review rate limit: 3/5 reviews remaining, refill in 23 minutes and 22 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-05-02T02:54:07Z

Greptile Summary

This PR fixes two hold-mode correctness bugs: overlapping VAD tail duplication on release (by restructuring _flush_hold to discard the VAD tail when raw shadow data is present, while still calling force_flush() for state reset), and premature VAD batch transcription during hold (by continue-ing past batch.append so the post-loop flush never fires on stale audio). It also adds a Windows microphone fallback that tries the device's native sample rate + np.interp resampling when the target rate is unsupported, keeping explicit device selection fail-closed (no silent recording from a different mic). Regression tests cover both engine fixes and all three AudioStream fallback paths.

Confidence Score: 4/5

Safe to merge — no correctness regressions found; one benign P2 in the WASAPI sort key.

All P2s only. Core hold-mode and resampling logic is correct, _flush_hold state-machine transition is sound, and both new test suites provide meaningful regression coverage. The only finding is a wrong dict key (d.get("index", 0)) in the WASAPI sort secondary key, which is benign due to Python's stable sort.

dictation_tool/io.py — _all_input_devices sort key; otherwise no files require special attention.

Important Files Changed

Filename	Overview
dictation_tool/engine.py	Core fix: `_flush_hold` now discards the VAD tail (but still calls `force_flush()` for state reset) when `_raw_shadow` has data, preventing duplication; `_run` adds a `continue` guard during hold mode to keep `batch` empty so the post-loop flush never fires stale audio.
dictation_tool/io.py	New resampling fallback in `AudioStream`: `_open_stream` tries target rate first, then native rate + `np.interp`; `_build_device_candidates` is fail-closed for explicit devices; sort secondary key in `_all_input_devices` always resolves to 0 (wrong dict key).
tests/test_engine_advanced.py	Two new tests: `test_flush_hold_does_not_duplicate_vad_tail` (directly calls `_flush_hold`, asserts only shadow samples passed to transcribe) and `test_hold_mode_does_not_transcribe_vad_batches_before_release` (verifies post-loop batch stays empty during hold); both are valid regression tests.
tests/test_io.py	Three new `AudioStream` tests covering: fail-closed explicit device, Windows default fallback candidate list, and resampled callback shape/dtype/raw-chunk equality.
README.md	Latency bound updated (5–200 ms) and "good prompt" replaced with "preset" — documentation-only, consistent with the resampling/preset work.

Sequence Diagram

sequenceDiagram
    participant Mouse as Mouse Thread
    participant Run as _run() loop
    participant Shadow as _raw_shadow
    participant VAD as VADGate
    participant Flush as _flush_hold()
    participant Clip as _clip_worker

    Mouse->>Shadow: .clear() on press
    Mouse->>Run: _holding = True

    loop audio chunks during hold
        Run->>VAD: chunk arrives (via stream.chunks())
        Note over Run: mouse_hold_to_record & _holding → continue
        Note over Run: batch stays empty
        Run-->>Shadow: on_raw_chunk → shadow.append(chunk)
    end

    Mouse->>Flush: mouse release triggers _flush_hold()
    Flush->>Shadow: concatenate(_raw_shadow) → segs
    Flush->>Shadow: .clear()
    Flush->>VAD: force_flush() — reset state, discard tail (would duplicate shadow)
    Flush->>Clip: _transcribe(segs) → paste result

Comments Outside Diff (1)

dictation_tool/io.py, line 654-658 (link)

Secondary sort key silently wrong — d.get("index", 0) always returns 0

sounddevice device dicts do not have an "index" key; the device index is the loop variable i. As a result d.get("index", 0) is always 0, making the secondary sort key a no-op (all devices within a host-API priority group are treated as equally ranked). Python's stable sort keeps them in enumeration order, so the behavior is deterministic but not what the intent implies. The fix is to pass the outer pair[0] index instead.

_{Reviews (1): Last reviewed commit: "fix(engine): Stabilize dictation capture..." | Re-trigger Greptile}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(engine): stabilize dictation capture and release paste#1

fix(engine): stabilize dictation capture and release paste#1
future3OOO wants to merge 1 commit into
masterfrom
fix/dictation-release-paste

future3OOO commented May 2, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 2, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

greptile-apps Bot commented May 2, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

future3OOO commented May 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

greptile-apps Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

future3OOO commented May 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 2, 2026 •

edited

Loading

greptile-apps Bot commented May 2, 2026 •

edited

Loading