Releases: rozetyp/voice-codec-modem
Releases · rozetyp/voice-codec-modem
v0.1.0 - Initial publication
Reproducible Pareto curve across three speech-codec classes.
Headline results
| Channel | Where it ships | Reliable rate |
|---|---|---|
| Opus 48k AUDIO | Discord, music-mode WebRTC | 4000 bps modem-tone, 933 bps real-speech-cover |
| Opus 24k VOIP | Zoom, Teams, WebRTC | 3196 bps modem-tone, 270 bps real-speech-cover |
| AMR-NB 12.2k | 2G/3G cellular voice | 76 bps real-speech-cover |
All post-FEC, multi-seed validated through real ffmpeg codecs (not surrogates). Zero residual errors observed on 76,000+ bits per headline checkpoint.
What is included
- Six trained checkpoints under
core/neural_codec/totalling ~109 MB - Hand-coded TRIZ pitch modem (175 bps reliable, no neural network needed)
- End-to-end voice-within-voice demos for each codec
- Multi-probe synthesis comparing EnCodec, Mimi, DAC token survival through Opus
- Stacked stego_opus + prosody channel (538 bps reliable, demoed with Codec2 700C)
- Listenable WAV samples for every variant
What this is not
- Not a commercial product. Productization analysis in docs/FINDINGS_AND_PLAN.md.
- Not novel methodology. Methods are published prior art.
- Not above the published state of the art for AMR-NB stego (codec-internal QIM methods report 1-3 kbps).
License
Apache License 2.0. The patent-grant clause matters here because audio watermarking is a patent-active field.