Skip to content

WhisperKit: lock audio sample/energy buffers in AudioProcessor#482

Open
yemreak wants to merge 2 commits into
argmaxinc:mainfrom
yemreak:fix/whisperkit-audioprocessor-thread-safety
Open

WhisperKit: lock audio sample/energy buffers in AudioProcessor#482
yemreak wants to merge 2 commits into
argmaxinc:mainfrom
yemreak:fix/whisperkit-audioprocessor-thread-safety

Conversation

@yemreak
Copy link
Copy Markdown

@yemreak yemreak commented May 18, 2026

Summary

AudioProcessor.audioSamples and audioEnergy are written from the AVAudioEngine tap thread and read from arbitrary threads (VAD polling on main, transcription on a background queue). Under Swift 6 Strict Concurrency this is flagged; in practice TSan also catches sporadic data races on the underlying ContiguousArray and [(rel, avg, max, min)].

Motivation

We caught this both as warnings under StrictConcurrency and as a real intermittent crash in a long-running dictation app that polls relativeEnergy from the main thread while the tap is running.

Changes

  • Introduce a private audioLock: NSLock.
  • Back the two stored properties with _audioSamples / _audioEnergy and expose them through locked getters and setters.
  • In processBuffer hold the lock only across the shared-state mutation; the audioBufferCallback is invoked after the lock is released to avoid potential re-entrant deadlocks.
  • relativeEnergy and purgeAudioSamples updated to go through the lock.

Public surface and semantics are unchanged; callers continue to read audioSamples / audioEnergy the same way.

Testing

Built and ran the existing tests; ran our internal dictation pipeline (continuous audio capture + main-thread VAD polling) under TSan with no warnings after the change. Reverting the patch reproduces the race.

a2they and others added 2 commits May 1, 2026 16:11
`AudioProcessor.audioSamples` and `audioEnergy` are written from
the AVAudioEngine tap thread and read from arbitrary threads
(VAD polling on main, transcription on a background queue).
Under Swift 6 Strict Concurrency the unsynchronised access is
flagged; in practice it also produces sporadic data races
detectable with TSan.

Introduce an `audioLock` (NSLock) and expose `audioSamples` /
`audioEnergy` through locked getters and setters. `processBuffer`
holds the lock only across the shared-state mutation and releases
it before invoking `audioBufferCallback` to avoid potential
re-entrant deadlocks. Public surface and semantics are unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants