Module filtered-audio

filtered-audio module provides a model to filter audio input from a source microphone based on wake words.

Supported Platforms

Darwin ARM64
Linux x64
Linux ARM64

Models

This module provides the following model(s):

[viam:filtered-audio:wake-word-filter]

Model viam:filtered-audio:wake-word-filter

Configuration

The following attribute template can be used to configure this model:

For vosk:

{
  "source_microphone" : <AUDIO_IN NAME>,
  "wake_words": ["<word>"]
}

For openwakeword:

{
  "source_microphone": <AUDIO_IN NAME>,
  "detection_engine": "openwakeword",
  "oww_model_path": "<path or URL to .onnx model>"
}

Configuration Attributes

The following attributes are available for the viam:filtered-audio:wake-word-filter model:

Name	Type	Inclusion	Description
`source_microphone`	string	Required	Name of a Viam AudioIn component to recieve and filter audio from.
`detection_engine`	string	Optional	Wake word detection engine to use. Options: `vosk`, `openwakeword`. Default: `vosk`.
`vad_aggressiveness`	int	Optional	Sensitivity of the webRTC VAD (voice activity detection). A higher number is more restrictive in reporting speech, and missed detection rates go up. A lower number is less restrictive but may report background noise as speech. Range: 0-3. Default: 3.
`silence_duration_ms`	int	Optional	Milliseconds of continuous silence needed before speech is considered finished. Default: 900
`min_speech_ms`	int	Optional	The minimum length (in milliseconds) a speech segment must be before it is treated as valid speech. Shorter sounds are ignored. Default: 300

Vosk Attributes

Name	Type	Inclusion	Description
`wake_words`	string array	Required	Wake words to filter speech. All speech segments said after the wake words will be returned from get_audio.
`vosk_model`	string	Optional	Vosk model to use for speech recognition. Accepts a model name, directory path, or zip file path. Default: `vosk-model-small-en-us-0.15`. See list of available models. For models larger than 1GB, download manually and provide the file path.
`use_grammar`	bool	Optional	When true, Vosk uses grammar-constrained recognition limited to wake words for better accuracy with short wake words. When false, uses full transcription mode which has higher accuracy for longer wake phrases (3+ words). Default: true
`vosk_grammar_confidence`	float	Optional	Minimum confidence threshold (0.0-1.0) for wake word recognition. Lower confidence matches will be rejected. Default: 0.7
`fuzzy_threshold`	int	Optional	Enable fuzzy wake word matching. The threshold (0-5) is the maximum number of character edits (insertions, deletions, substitutions) allowed between the transcript and wake word. If not set, exact matching is used. Note use_grammar must be set to false to use fuzzy matching.

OpenWakeWord Attributes

These attributes apply when detection_engine is set to openwakeword.

Name	Type	Inclusion	Description
`oww_model_path`	string	Required	Path or URL to a custom `.onnx` wakeword model file. Local paths and HTTP/HTTPS URLs are supported. URL models are downloaded and cached in `VIAM_MODULE_DATA`.
`oww_threshold`	float	Optional	Detection confidence threshold (0.0-1.0). A higher value requires more confidence before triggering, reducing false positives. Default: 0.5

Source Microphone Requirements

The source microphone must provide audio in the following format:

Requirement	Value	Description
Codec	PCM16	16-bit PCM audio format
Sample Rate	16000 Hz	Required for Vosk model
Channels	1 (Mono)	Stereo audio is not supported

Example configuration for source microphone:

{
  "name": "my-microphone",
  "type": "audio_in",
  "model": "...",
  "attributes": {
    "sample_rate": 16000,
    "channels": 1
  }
}

Recommended Source Microphone: Use the viam:system-audio module, which supports resampling and can output 16 kHz mono PCM16 audio from any system microphone.

Training a Custom OpenWakeWord Model

To use detection_engine: openwakeword you need a custom .onnx model trained on your wake word. Use the openWakeWord automatic training notebook to generate one.

Once trained, set oww_model_path to the local path or a URL pointing to the .onnx file.

Fuzzy Wake Word Matching

The wake word filter supports fuzzy matching using Levenshtein distance (edit distance) via the rapidfuzz library. This improves accuracy when speech recognition produces slight variations (e.g., "hey robot" transcribed as "the robot").

Enabling Fuzzy Matching

To enable fuzzy wake word matching, add fuzzy_threshold to your configuration:

{
  "source_microphone": "mic",
  "wake_words": ["hey robot"],
  "fuzzy_threshold": 2
}

How It Works

Fuzzy matching compares the wake phrase against the first few words of the transcript. It measures how many character edits (insertions, deletions, substitutions) are needed to transform one into the other. This handles common speech-to-text errors like "the robot" being transcribed instead of "hey robot" (2 character changes: t→h, h→e→y):

Transcribed	Wake Word	Distance	Match (threshold=2)
"the robot say something"	"hey robot"	2	✓
"hey Robert what time"	"hey robot"	2	✓
"a robot turn on lights"	"hey robot"	3	✗
"please hey robot do it"	"hey robot"	-	✗ (not at start)

Threshold Guidelines

Threshold	Use Case
1	Very strict - for short wake words or quiet environments
2-3	Recommended for most wake words
4-5	Lenient - for noisy environments (may increase false positives)

get_audio()

The wake word filter implements the AudioIn get_audio() method:

Parameters

codec: Must be "pcm16". Other codecs are not supported.
duration_seconds: Use 0 for continuous streaming
previous_timestamp_ns: Use 0 to start from current time.

Stream Behavior

The filter returns a continuous stream that:

Monitors continuously for wake words using VAD (Voice Activity Detection) and Vosk speech recognition
Only yields chunks when a wake word is detected followed by speech
Uses empty chunks to signal speech segment boundaries

Stream Protocol:

Normal chunks: Contain audio data (16kHz mono PCM16) for detected speech segments
Empty chunks: Signal the end of a speech segment (audio_data has length 0)

After yielding a speech segment and empty chunk, the filter resumes listening for the next wake word automatically.

Example Usage

Basic accumulation and processing:

# Get continuous stream
audio_stream = await filter.get_audio("pcm16", 0, 0)

segment = bytearray()

async for chunk in audio_stream:
    audio_data = chunk.audio.audio_data

    if len(audio_data) == 0:
        # Empty chunk = segment ended
        if segment:
            process_speech_segment(bytes(segment))
            segment.clear()
    else:
        # Normal chunk - accumulate audio
        segment.extend(audio_data)

Clients should continue consuming chunks even while processing previous segments to avoid stream disconnection.

See examples/ directory for complete usage examples.

Do command

The wake word filter supports do_command() for pausing and resuming detection. This is useful for voice assistants that need to prevent the filter from detecting its own TTS (text-to-speech) output.

Supported Commands

Command	Description
`pause_detection`	Pauses wake word detection. Audio is still consumed but not processed.
`resume_detection`	Resumes wake word detection.

Example Usage

# Pause detection before playing TTS audio
await filter.do_command({"pause_detection": None})

await audio_output.play(audio_data)

# Resume detection after TTS finishes
await filter.do_command({"resume_detection": None})

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
examples		examples
hooks		hooks
src		src
tests		tests
.canon.yaml		.canon.yaml
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build.sh		build.sh
local_run.sh		local_run.sh
main.spec		main.spec
meta.json		meta.json
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Module filtered-audio

Supported Platforms

Models

Model viam:filtered-audio:wake-word-filter

Configuration

Configuration Attributes

Vosk Attributes

OpenWakeWord Attributes

Source Microphone Requirements

Training a Custom OpenWakeWord Model

Fuzzy Wake Word Matching

Enabling Fuzzy Matching

How It Works

Threshold Guidelines

get_audio()

Parameters

Stream Behavior

Example Usage

Do command

Supported Commands

Example Usage

About

Uh oh!

Releases 18

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Module filtered-audio

Supported Platforms

Models

Model viam:filtered-audio:wake-word-filter

Configuration

Configuration Attributes

Vosk Attributes

OpenWakeWord Attributes

Source Microphone Requirements

Training a Custom OpenWakeWord Model

Fuzzy Wake Word Matching

Enabling Fuzzy Matching

How It Works

Threshold Guidelines

get_audio()

Parameters

Stream Behavior

Example Usage

Do command

Supported Commands

Example Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 18

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages