Skip to content

SensorFusion2026/Audio-Processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Audio Processing

Shared audio preprocessing utilities for the Elephant Listening Project (ELP).

This package provides a single, consistent audio processing pipeline used across:

  • Elephant rumble detection (CNN + RNN)
  • Gunshot detection (CNN)
  • Real-time deployment (Jetson Nano)

The goal is to remove duplicated preprocessing code, ensure consistency between training and deployment, and support efficient real-time inference.

This repository is intended to be used as a shared submodule by training and deployment repositories, not as a standalone application.

Design Principles

  • Single source of truth for audio preprocessing
  • Framework-agnostic by default (NumPy + SciPy)
  • Readable and explicit (config-driven, no hidden behavior)
  • Backward-compatible with existing trained models

What This Package Handles

  • Sample-rate standardization (4 kHz target)
  • Fixed-length clipping (pad / trim)
  • Time-domain filtering (e.g. lowpass for rumbles)
  • Spectrogram computation (STFT + log scaling)
  • Frequency slicing / masking
  • Optional normalization
  • Incremental STFT + rolling cache for real-time inference

What It Does Not Handle

  • WAV file I/O (kept repo-specific to preserve model parity)
  • Model training or evaluation
  • Dataset-level normalization or TFRecord sharding

Pipelines

Gunshot CNN

  • 4 s clips @ 4 kHz
  • Log-magnitude spectrogram
  • No time-domain filtering by default

Rumble CNN

  • 5 s clips @ 4 kHz
  • Log-magnitude spectrogram
  • Lowpass filtered at 200 Hz
  • Frequencies above 200 Hz discarded

Rumble RNN (Waveform)

  • 5 s clips @ 4 kHz
  • Raw waveform input
  • Lowpass filtered at 200 Hz
  • No spectrogram computation

RNN models are used only for elephant rumble detection.


Streaming / Real-Time Processing

For deployment (e.g. Jetson Nano), this package supports incremental STFT:

  • Only newly available STFT frames are computed
  • Frames are stored in a rolling spectrogram cache
  • Overlapping inference windows reuse cached frames

This avoids recomputing spectrograms for overlapping windows and significantly reduces CPU usage in real-time inference.


Installation

Minimal install: pip install -e .

With SciPy DSP support: pip install -e ".[scipy]"

With TensorFlow (legacy parity / training): pip install -e ".[scipy,tf]"

For development: pip install -e ".[scipy,dev]"


Example usage:

Example 1:

from audio_processing.pipelines import RumblePipeline

pipe = RumblePipeline()
spec = pipe.extract_from_audio(audio, sr)  # (T, F, 1)

Example 2:

from audio_processing.streaming import StreamingSpecPipeline
from audio_processing.configs import rumble_default_config

stream = StreamingSpecPipeline(rumble_default_config())
stream.append_audio(chunk, sr)

if stream.ready_for_clip():
    spec = stream.get_latest_clip_spec()

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages