Skip to content

improvements to wav2aug#5

Merged
gfdb merged 17 commits intomainfrom
match-perf
Jan 11, 2026
Merged

improvements to wav2aug#5
gfdb merged 17 commits intomainfrom
match-perf

Conversation

@gfdb
Copy link
Owner

@gfdb gfdb commented Jan 8, 2026

This PR brings significant performance improvements to wav2aug augmentations, aligning behavior with SpeechBrain's implementations while dramatically reducing compute time.

Key Changes

speed_perturb

  • Use integer percentages (90, 100, 110) instead of float multipliers (0.9, 1.0, 1.1) and then round. This ensures good GCD with sample rates, making the sinc resampling filter much smaller
  • Cache torchaudio.transforms.Resample objects via lru_cache to avoid recomputing filter kernels

chunk_swap

  • Replaced nested Python loops with fully vectorized gather/scatter operations
  • Eliminated per-sample iteration entirely

NoiseLoader

  • New preload mode
  • New class that preloads all noise files into CPU RAM at initialization
  • Configurable storage_dtype (default float16) for memory efficiency w extremely tiny perf. degradation
  • Noise sampling becomes a fast tensor slice with zero I/O
  • Memory: ~650MB for pointsource_noises pack

Other improvements
freq_drop: Ported SpeechBrain's notch filter implementation for correctness
rand_amp_clip: Fixed normalization and uses single clip value per batch (matches SpeechBrain)
time_dropout: Vectorized implementation
Wav2Aug: Simplified interface, uses NoiseLoader by default

@gfdb gfdb merged commit 7ad55cf into main Jan 11, 2026
2 checks passed
@gfdb gfdb deleted the match-perf branch January 12, 2026 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant