fix(engine): stabilize dictation capture and release paste#1
Conversation
Prevent mouse-hold dictation from transcribing overlapping VAD batches before release, and harden Windows microphone fallback so default devices are portable while explicit devices fail closed. Co-authored-by: Cursor <cursoragent@cursor.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
WalkthroughThis PR refactors the audio input pipeline by improving VADGate segment creation with dedicated helper methods, enhancing AudioStream with device fallback selection and automatic resampling in the callback, and adjusting engine VAD flush behavior to avoid duplicating tail samples. Tests validate device selection logic and VAD buffer behavior. Documentation updates latency expectations for the medium.en model. ChangesAudio I/O & VAD Pipeline Refactoring
Documentation Update
Sequence DiagramsequenceDiagram
participant Client
participant AudioStream
participant Device Selection
participant Native Device
participant Resampler
participant VADGate
participant Engine
Client->>AudioStream: open_stream(target_rate)
AudioStream->>Device Selection: attempt target_rate device
alt Device supports target rate
Device Selection->>Native Device: open at target_rate
else Fallback to native
Device Selection->>Native Device: open at native_rate
end
Native Device->>Resampler: callback(native_rate_chunk)
Resampler->>Resampler: np.interp if native ≠ target
Resampler->>VADGate: _process_frame(target_rate_chunk)
alt Speech detected
VADGate->>VADGate: _create_segment()
VADGate->>Engine: segment (no duplicate tail)
else Silence buffered
VADGate->>VADGate: accumulate in pre_buffer
end
Engine->>Engine: _flush_hold() with dedup logic
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 3/5 reviews remaining, refill in 23 minutes and 22 seconds. Comment |
Greptile SummaryThis PR fixes two hold-mode correctness bugs: overlapping VAD tail duplication on release (by restructuring Confidence Score: 4/5Safe to merge — no correctness regressions found; one benign P2 in the WASAPI sort key. All P2s only. Core hold-mode and resampling logic is correct,
Important Files Changed
Sequence DiagramsequenceDiagram
participant Mouse as Mouse Thread
participant Run as _run() loop
participant Shadow as _raw_shadow
participant VAD as VADGate
participant Flush as _flush_hold()
participant Clip as _clip_worker
Mouse->>Shadow: .clear() on press
Mouse->>Run: _holding = True
loop audio chunks during hold
Run->>VAD: chunk arrives (via stream.chunks())
Note over Run: mouse_hold_to_record & _holding → continue
Note over Run: batch stays empty
Run-->>Shadow: on_raw_chunk → shadow.append(chunk)
end
Mouse->>Flush: mouse release triggers _flush_hold()
Flush->>Shadow: concatenate(_raw_shadow) → segs
Flush->>Shadow: .clear()
Flush->>VAD: force_flush() — reset state, discard tail (would duplicate shadow)
Flush->>Clip: _transcribe(segs) → paste result
|
Summary
Test plan
.venv310\Scripts\python.exe -m pytest tests/test_engine_advanced.py::TestAdvancedDictationEngine::test_flush_hold_does_not_duplicate_vad_tail tests/test_engine_advanced.py::TestAdvancedDictationEngine::test_hold_mode_does_not_transcribe_vad_batches_before_release -q.venv310\Scripts\python.exe -m pytest tests/test_io.py -q.venv310\Scripts\python.exe -m black --check dictation_tool/engine.py dictation_tool/io.py tests/test_engine_advanced.py tests/test_io.pygit diff --checkSummary by CodeRabbit
Documentation
Improvements