okvad

okvad is a unified multi-engine Voice Activity Detection (VAD) library for the web.

Source repository: https://github.com/hosamsh/okvad

Supports:

Three VAD algorithms: DSP (pure JavaScript), WebRTC (WebAssembly), Silero (ONNX neural network)
Automatic recording: Built-in audio recording with pre-roll capture and automatic segment splitting
Output formatting: Pre-configured presets for popular APIs (OpenAI, Azure, etc.)
Multiple build variants: ESM for bundlers (Vite, Webpack), IIFE for plain HTML. Works with vanilla JavaScript, React, Vue, Next.js, and any bundler
TypeScript ready: type definitions included

Installation

With Bundlers (Vite, Webpack, Next.js, etc.)

npm install hosamsh/okvad
# or
npm install git+https://github.com/hosamsh/okvad.git

okvad is currently distributed via GitHub. The commands above pull the library straight from https://github.com/hosamsh/okvad.

import { Vad, USE_CASES, ALGOS } from 'okvad';

const vad = new Vad({
  algo: ALGOS.WEBRTC,
  useCase: USE_CASES.STREAMING,
  onUtteranceEnd: (segment) => {
    console.log('Captured speech:', segment.duration.toFixed(2), 'seconds');
  }
});

await vad.start();

Plain HTML (No Build Step)

<script src="path/to/okvad-webrtc.browser.min.js"></script>
<script>
  const vad = new OkVad.Vad({ algo: 'webrtc' });
  vad.start();
</script>

Build Variants

Choose the variant that matches your needs:

Variant	Size	Algorithms	Best For
`okvad/core`	15 KB	DSP only	Minimal bundle size
`okvad/webrtc`	75 KB	+ WebRTC (WASM)	General purpose
`okvad/silero`	25 KB + CDN	+ Silero (ONNX)	Highest accuracy
`okvad`	90 KB	All three	Choice flexibility

Quick Start

Basic Usage

import { Vad, ALGOS } from 'okvad';

const vad = new Vad({
  algo: ALGOS.WEBRTC,
  onUtteranceStart: () => console.log('Utterance detected'),
  onUtteranceEnd: (segment) => {
    console.log('Complete utterance:', segment);
    // segment contains: audioData, blob, url, duration, sampleRate
  }
});

await vad.start();
// ...later
await vad.stop();

Use Case Presets

Choose a preset optimized for your use case:

import { Vad, USE_CASES, ALGOS } from 'okvad';

// Real-time streaming (fast response, tolerates breathing pauses)
const vad1 = new Vad({
  algo: ALGOS.DSP,
  useCase: USE_CASES.STREAMING
});

// Speech transcription (captures complete sentences)
const vad2 = new Vad({
  algo: ALGOS.WEBRTC,
  useCase: USE_CASES.TRANSCRIPTION
});

// Voice commands (instant response for short utterances)
const vad3 = new Vad({
  algo: ALGOS.SILERO,
  useCase: USE_CASES.COMMANDS
});

Auto-Formatted Output for APIs

import { Vad, PRESETS } from 'okvad';

// OpenAI Realtime API (PCM16, 24kHz, base64)
const vad = new Vad({
  preset: PRESETS.OPENAI_REALTIME,
  onFrame: (result) => {
    if (result.smoothedSpeech) {
      websocket.send(JSON.stringify({
        type: 'input_audio_buffer.append',
        audio: result.audio  // Already formatted
      }));
    }
  }
});

Available delivery presets: PRESETS.OPENAI_REALTIME, PRESETS.OPENAI_REALTIME_MULAW_8K, PRESETS.OPENAI_REALTIME_TRANSCRIBE, PRESETS.OPENAI_TRANSCRIBE_BATCH, PRESETS.CUSTOM

Automatic Recording

const vad = new Vad({
  algo: 'webrtc',
  maxSegmentMs: 180000,  // Auto-split at 3 minutes
  onUtteranceEnd: (segment) => {
    // Download recording
    const link = document.createElement('a');
    link.href = segment.url;
    link.download = 'speech.wav';
    link.click();
  }
});

await vad.start();

Recording is automatically enabled whenever you provide onUtteranceEnd or onUtteranceChunk callbacks (unless you explicitly set enableRecording: false).

Debug Logging

Enable verbose logs (prefixed with [VAD]) by either:

Passing debug: true to the Vad constructor for instance-scoped logging.
Calling setDebug(true) to enable logging globally (exported from okvad), and setDebug(false) to silence it again.

API Reference

Constructor Options

new Vad({
  algo: 'dsp' | 'webrtc' | 'silero',     // Default: 'dsp'
  useCase: 'streaming' | 'transcription' | 'commands',  // Default: 'streaming'
  preset: PRESETS.OPENAI_REALTIME,        // Delivery preset (see PRESETS constants)
  sampleRate: number,                     // Default: 16000
  maxSegmentMs: number,                   // Default: 180000 (3 minutes)
  
  // Timing (milliseconds, auto-configured per algorithm)
  speechHangoverMs: number,               // Silence duration before ending speech
  preSpeechPadMs: number,                 // Required speech before detection
  
  // Callbacks
  onUtteranceStart: () => void,
  onUtteranceEnd: (segment) => void,      // Auto-enables recording
  onUtteranceChunk: (chunk) => void,      // Auto-enables recording for streaming STT
  onFrame: (result) => void,
  onError: (error) => void,
  
  // Algorithm-specific
  webrtcMode: 0 | 1 | 2 | 3,             // WebRTC aggressiveness
  positiveSpeechThreshold: number,       // Silero sensitivity
  
  // Advanced
  output: { format, encoding, sampleRate },
  debug: boolean
});

Methods

await vad.start()          // Start listening for speech
await vad.stop()           // Stop listening and cleanup
await vad.destroy()        // Release all resources
await vad.getPreRoll()     // Get audio captured before speech start
vad.createWavSegment()     // Manual recording encoder
vad.updateSettings({})     // Update settings dynamically

Properties

vad.running               // Boolean: currently processing audio
vad.recording             // Boolean: currently recording (auto-recording only)
vad.vadAlgo              // String: active algorithm ('dsp'|'webrtc'|'silero')

Frame Result

{
  isSpeech: boolean,           // Raw VAD result
  smoothedSpeech: boolean,     // Smoothed result (after hangover)
  samples: Float32Array,       // Audio samples for frame
  audio: ArrayBuffer | string | Blob, // Processed audio in configured format
  probability: number,         // 0-1 (Silero only)
  energy: number               // Frame energy (DSP only)
}

Utterance Segment (Recording)

{
  audioData: Float32Array,    // Raw samples
  blob: Blob,                 // WAV file
  url: string,                // Object URL (download/playback)
  duration: number,           // Seconds
  sampleRate: number,         // Hz
  channels: number,           // Always 1
  samples: number             // Total count
}

TypeScript Support

Full type definitions included:

import type { VadOptions, UtteranceSegment, FrameResult } from 'okvad';

const options: VadOptions = {
  algo: 'webrtc',
  onUtteranceEnd: (segment: UtteranceSegment) => {}
};

Live Demo

Check the demos/ folder:

demos/record.html - Record speech with different algorithms
demos/stream.html - Real-time streaming example

Development

npm install
npm run build           # Build all variants
npm run build:types    # Generate TypeScript definitions
npm run lint           # Check code quality
npm run lint:fix       # Auto-fix issues

Browser Support

Requires:

Modern browser with Web Audio API (Chrome, Firefox, Safari, Edge)
HTTPS or localhost (microphone access required)

License

MIT License - See LICENSE

Third-Party Components

This library incorporates code from:

WebRTC VAD - BSD 3-Clause
Silero VAD - MIT
@ricky0123/vad - ISC
ONNX Runtime - MIT

See THIRD-PARTY-NOTICES.md and PATENTS.md for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
demos		demos
src		src
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
LICENSE		LICENSE
PATENTS.md		PATENTS.md
README.md		README.md
THIRD-PARTY-NOTICES.md		THIRD-PARTY-NOTICES.md
build.js		build.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

okvad

Installation

With Bundlers (Vite, Webpack, Next.js, etc.)

Plain HTML (No Build Step)

Build Variants

Quick Start

Basic Usage

Use Case Presets

Auto-Formatted Output for APIs

Automatic Recording

Debug Logging

API Reference

Constructor Options

Methods

Properties

Frame Result

Utterance Segment (Recording)

TypeScript Support

Live Demo

Development

Browser Support

License

Third-Party Components

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

okvad

Installation

With Bundlers (Vite, Webpack, Next.js, etc.)

Plain HTML (No Build Step)

Build Variants

Quick Start

Basic Usage

Use Case Presets

Auto-Formatted Output for APIs

Automatic Recording

Debug Logging

API Reference

Constructor Options

Methods

Properties

Frame Result

Utterance Segment (Recording)

TypeScript Support

Live Demo

Development

Browser Support

License

Third-Party Components

Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages