okvad is a unified multi-engine Voice Activity Detection (VAD) library for the web.
Source repository: https://github.com/hosamsh/okvad
Supports:
- Three VAD algorithms: DSP (pure JavaScript), WebRTC (WebAssembly), Silero (ONNX neural network)
- Automatic recording: Built-in audio recording with pre-roll capture and automatic segment splitting
- Output formatting: Pre-configured presets for popular APIs (OpenAI, Azure, etc.)
- Multiple build variants: ESM for bundlers (Vite, Webpack), IIFE for plain HTML. Works with vanilla JavaScript, React, Vue, Next.js, and any bundler
- TypeScript ready: type definitions included
npm install hosamsh/okvad
# or
npm install git+https://github.com/hosamsh/okvad.gitokvad is currently distributed via GitHub. The commands above pull the library straight from https://github.com/hosamsh/okvad.
import { Vad, USE_CASES, ALGOS } from 'okvad';
const vad = new Vad({
algo: ALGOS.WEBRTC,
useCase: USE_CASES.STREAMING,
onUtteranceEnd: (segment) => {
console.log('Captured speech:', segment.duration.toFixed(2), 'seconds');
}
});
await vad.start();<script src="path/to/okvad-webrtc.browser.min.js"></script>
<script>
const vad = new OkVad.Vad({ algo: 'webrtc' });
vad.start();
</script>Choose the variant that matches your needs:
| Variant | Size | Algorithms | Best For |
|---|---|---|---|
okvad/core |
15 KB | DSP only | Minimal bundle size |
okvad/webrtc |
75 KB | + WebRTC (WASM) | General purpose |
okvad/silero |
25 KB + CDN | + Silero (ONNX) | Highest accuracy |
okvad |
90 KB | All three | Choice flexibility |
import { Vad, ALGOS } from 'okvad';
const vad = new Vad({
algo: ALGOS.WEBRTC,
onUtteranceStart: () => console.log('Utterance detected'),
onUtteranceEnd: (segment) => {
console.log('Complete utterance:', segment);
// segment contains: audioData, blob, url, duration, sampleRate
}
});
await vad.start();
// ...later
await vad.stop();Choose a preset optimized for your use case:
import { Vad, USE_CASES, ALGOS } from 'okvad';
// Real-time streaming (fast response, tolerates breathing pauses)
const vad1 = new Vad({
algo: ALGOS.DSP,
useCase: USE_CASES.STREAMING
});
// Speech transcription (captures complete sentences)
const vad2 = new Vad({
algo: ALGOS.WEBRTC,
useCase: USE_CASES.TRANSCRIPTION
});
// Voice commands (instant response for short utterances)
const vad3 = new Vad({
algo: ALGOS.SILERO,
useCase: USE_CASES.COMMANDS
});import { Vad, PRESETS } from 'okvad';
// OpenAI Realtime API (PCM16, 24kHz, base64)
const vad = new Vad({
preset: PRESETS.OPENAI_REALTIME,
onFrame: (result) => {
if (result.smoothedSpeech) {
websocket.send(JSON.stringify({
type: 'input_audio_buffer.append',
audio: result.audio // Already formatted
}));
}
}
});Available delivery presets: PRESETS.OPENAI_REALTIME, PRESETS.OPENAI_REALTIME_MULAW_8K, PRESETS.OPENAI_REALTIME_TRANSCRIBE, PRESETS.OPENAI_TRANSCRIBE_BATCH, PRESETS.CUSTOM
const vad = new Vad({
algo: 'webrtc',
maxSegmentMs: 180000, // Auto-split at 3 minutes
onUtteranceEnd: (segment) => {
// Download recording
const link = document.createElement('a');
link.href = segment.url;
link.download = 'speech.wav';
link.click();
}
});
await vad.start();Recording is automatically enabled whenever you provide onUtteranceEnd or onUtteranceChunk callbacks (unless you explicitly set enableRecording: false).
Enable verbose logs (prefixed with [VAD]) by either:
- Passing
debug: trueto theVadconstructor for instance-scoped logging. - Calling
setDebug(true)to enable logging globally (exported fromokvad), andsetDebug(false)to silence it again.
new Vad({
algo: 'dsp' | 'webrtc' | 'silero', // Default: 'dsp'
useCase: 'streaming' | 'transcription' | 'commands', // Default: 'streaming'
preset: PRESETS.OPENAI_REALTIME, // Delivery preset (see PRESETS constants)
sampleRate: number, // Default: 16000
maxSegmentMs: number, // Default: 180000 (3 minutes)
// Timing (milliseconds, auto-configured per algorithm)
speechHangoverMs: number, // Silence duration before ending speech
preSpeechPadMs: number, // Required speech before detection
// Callbacks
onUtteranceStart: () => void,
onUtteranceEnd: (segment) => void, // Auto-enables recording
onUtteranceChunk: (chunk) => void, // Auto-enables recording for streaming STT
onFrame: (result) => void,
onError: (error) => void,
// Algorithm-specific
webrtcMode: 0 | 1 | 2 | 3, // WebRTC aggressiveness
positiveSpeechThreshold: number, // Silero sensitivity
// Advanced
output: { format, encoding, sampleRate },
debug: boolean
});await vad.start() // Start listening for speech
await vad.stop() // Stop listening and cleanup
await vad.destroy() // Release all resources
await vad.getPreRoll() // Get audio captured before speech start
vad.createWavSegment() // Manual recording encoder
vad.updateSettings({}) // Update settings dynamicallyvad.running // Boolean: currently processing audio
vad.recording // Boolean: currently recording (auto-recording only)
vad.vadAlgo // String: active algorithm ('dsp'|'webrtc'|'silero'){
isSpeech: boolean, // Raw VAD result
smoothedSpeech: boolean, // Smoothed result (after hangover)
samples: Float32Array, // Audio samples for frame
audio: ArrayBuffer | string | Blob, // Processed audio in configured format
probability: number, // 0-1 (Silero only)
energy: number // Frame energy (DSP only)
}{
audioData: Float32Array, // Raw samples
blob: Blob, // WAV file
url: string, // Object URL (download/playback)
duration: number, // Seconds
sampleRate: number, // Hz
channels: number, // Always 1
samples: number // Total count
}Full type definitions included:
import type { VadOptions, UtteranceSegment, FrameResult } from 'okvad';
const options: VadOptions = {
algo: 'webrtc',
onUtteranceEnd: (segment: UtteranceSegment) => {}
};Check the demos/ folder:
demos/record.html- Record speech with different algorithmsdemos/stream.html- Real-time streaming example
npm install
npm run build # Build all variants
npm run build:types # Generate TypeScript definitions
npm run lint # Check code quality
npm run lint:fix # Auto-fix issuesRequires:
- Modern browser with Web Audio API (Chrome, Firefox, Safari, Edge)
- HTTPS or localhost (microphone access required)
MIT License - See LICENSE
This library incorporates code from:
- WebRTC VAD - BSD 3-Clause
- Silero VAD - MIT
- @ricky0123/vad - ISC
- ONNX Runtime - MIT
See THIRD-PARTY-NOTICES.md and PATENTS.md for details.