Skip to content

Releases: rozetyp/voice-codec-modem

v0.1.0 - Initial publication

06 May 17:29

Choose a tag to compare

Reproducible Pareto curve across three speech-codec classes.

Headline results

Channel Where it ships Reliable rate
Opus 48k AUDIO Discord, music-mode WebRTC 4000 bps modem-tone, 933 bps real-speech-cover
Opus 24k VOIP Zoom, Teams, WebRTC 3196 bps modem-tone, 270 bps real-speech-cover
AMR-NB 12.2k 2G/3G cellular voice 76 bps real-speech-cover

All post-FEC, multi-seed validated through real ffmpeg codecs (not surrogates). Zero residual errors observed on 76,000+ bits per headline checkpoint.

What is included

  • Six trained checkpoints under core/neural_codec/ totalling ~109 MB
  • Hand-coded TRIZ pitch modem (175 bps reliable, no neural network needed)
  • End-to-end voice-within-voice demos for each codec
  • Multi-probe synthesis comparing EnCodec, Mimi, DAC token survival through Opus
  • Stacked stego_opus + prosody channel (538 bps reliable, demoed with Codec2 700C)
  • Listenable WAV samples for every variant

What this is not

  • Not a commercial product. Productization analysis in docs/FINDINGS_AND_PLAN.md.
  • Not novel methodology. Methods are published prior art.
  • Not above the published state of the art for AMR-NB stego (codec-internal QIM methods report 1-3 kbps).

License

Apache License 2.0. The patent-grant clause matters here because audio watermarking is a patent-active field.