-
Notifications
You must be signed in to change notification settings - Fork 102
[π BUG] Dictation activation delay causes first words to be droppedΒ #236
Description
Bug Description
When activating dictation via hotkey, there is a significant delay (~500msβ1s+) before the microphone actually starts capturing audio. Any words spoken during this startup window are permanently lost, as there is no mechanism to retroactively capture them.
This is especially noticeable when you start speaking immediately after pressing the hotkey, which is the natural user behavior.
Steps to Reproduce
- Launch FluidVoice with Parakeet (or any model)
- Press the dictation hotkey
- Immediately begin speaking a sentence (e.g., "Hello this is a test")
- Observe the transcription output
Expected: Full sentence is captured β "Hello this is a test"
Actual: First 1 word is dropped β "This is a test"
The following is done by Opus as why this is happening:
Root Cause Analysis
The activation flow in ASRService.start() runs multiple sequential blocking steps before installing the audio tap:
- Hotkey β
start()dispatch (~200ms) βTask { @MainActor }dispatch plus ContentView callback overhead - Start sound plays synchronously β before
ASRService.start()is even called configureSession()(~85β275ms) β forcesAVAudioEngineinput node instantiationstartEngine()(~150β320ms) β device binding +prepare()+engine.start()setupEngineTap()(~1ms) β mic finally starts capturing here- 1-second minimum audio accumulation β
processStreamingChunk()requires 16,000 samples (1s at 16kHz) before attempting transcription
The core issue is that audio capture does not begin until all setup is complete, and ThreadSafeAudioBuffer is cleared on every start() call with no circular/ring buffer to preserve earlier audio.
Additionally, stop() deallocates the AVAudioEngine entirely, forcing a full rebuild on every subsequent start().
Related Issues
- [π BUG] V1.5.8 ARM64 crash: CheckedContinuation double-resume in MediaRemoteAdapter + ~5s input delayΒ #188 β ~5 second delay traced to
MediaRemoteAdapterblocking main thread - [π BUG] When pressing the hotkey to start dictation, there's ~1 second delay before the notch appearsΒ #98 β ~1 second delay acknowledged by developer, partially reduced by ~150ms
Suggested Improvements
1. Always-on mic with a rolling circular buffer
Keep a ~2β3 second ring buffer of audio running in the background. On activation, prepend the buffered audio to the recording session. This way words spoken during startup are preserved.
2. Keep AVAudioEngine alive between sessions
Instead of tearing down and recreating the engine on every stop()/start() cycle, keep it running (or at least prepared) between sessions. This would eliminate the ~300β500ms configureSession() + startEngine() cost on each activation.
3. Start audio capture before UI work
Move setupEngineTap() to fire as early as possible in the hotkey callback β before the start sound, before UI state updates, before media pause. Capture first, update UI after.
4. Reduce the 16,000-sample minimum for first chunk
Allow the first processStreamingChunk() call to run with less than 1 second of audio. Even a partial first transcription is better than dropping words entirely.
Environment
- macOS (Apple Silicon)
- FluidVoice latest
- Parakeet TDT v3