Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
f8eda55
switch from floating point arithmetic to scaled integers
jbrough Jan 27, 2025
5b7b181
Revert "switch from floating point arithmetic to scaled integers"
jbrough Sep 7, 2025
21308e2
restrict changes to entropy coding paths
jbrough Sep 7, 2025
f0267cf
quantisation changes
jbrough Sep 8, 2025
267e5a8
segment boundaries
jbrough Sep 8, 2025
f05261d
Revert "segment boundaries"
jbrough Sep 8, 2025
a485dc6
quantisation improvements
jbrough Sep 8, 2025
708087d
reinstate fb comments
jbrough Sep 8, 2025
2a45ecb
partly address performance degradations
jbrough Sep 9, 2025
db0a7c0
Merge deterministic LM precision improvements and acv=4 chunk framing
jbrough Mar 18, 2026
ee430c0
Merge claude branch: deterministic LM, acv=4 chunk framing, bug fixes
jbrough Mar 18, 2026
68e8d30
Add combined README with precision/robustness improvements documentation
jbrough Mar 18, 2026
c84f6cb
Improve deterministic LM bitstream controls
jbrough Mar 19, 2026
f9da4cf
Parallelize chunked LM segment encoding
jbrough Mar 20, 2026
b00c5bd
Tighten cross-host deterministic LM defaults
jbrough Mar 21, 2026
8782578
Checkpoint native entropy coding and CUDA decode LM
jbrough Apr 3, 2026
d3d0776
Speed up CUDA decode LM inference
jbrough Apr 3, 2026
b17da3e
Document Ada benchmarks and decode tradeoffs
jbrough Apr 3, 2026
1301c36
Restructure README for fork-first docs
jbrough Apr 3, 2026
ebbb6d1
Quantify CPU decode tradeoff in README
jbrough Apr 3, 2026
c7b089c
Default CPU decode workers to auto headroom
jbrough Apr 3, 2026
9075227
Tighten fork README tone
jbrough Apr 3, 2026
76995ee
Add frame-level ONNX export bundle
jbrough Apr 12, 2026
e5c7ffd
Export ONNX frame bundles with dynamic batch
jbrough Apr 12, 2026
4757b5a
Save local work
jbrough Jun 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions PR_DESCRIPTION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Deterministic cross-platform LM entropy coding, acv=4 CRC chunk framing, and `_counts_from_pdf` bug fix

## Summary

This PR hardens the LM-backed entropy coding path for cross-platform correctness and adds per-segment failure isolation. The neural network weights and audio quality are unchanged. All existing `.ecdc` files decode correctly.

## Motivation

Three problems with the current LM entropy path:

1. **Non-deterministic across hardware.** `torch.softmax` can differ by a ULP between CPU, MPS, and CUDA. The arithmetic coder amplifies these differences — a single wrong probability pushes the decode state off track, producing `EOFError` or silent garbage. Payloads encoded on an Apple Silicon Mac reliably fail to decode on Linux CPU or CUDA.

2. **Silent corrupt decode at `tau=1.0`.** In `_counts_from_pdf`, the near-integer perturbation uses an alternating sign. When a token's probability is exactly `0.0` (common at `tau=1.0` due to float underflow of `exp(-large)`), the negative perturbation gives `x = -ε`, then `floor(-ε) = -1`. A negative count makes the CDF non-monotonic; the decoder produces wrong symbols with no error raised.

3. **No failure isolation.** A single corrupt byte anywhere in the payload desynchronises the arithmetic decoder and destroys the rest of the file.

## Changes

### `encodec/compress.py`

**Deterministic CDF construction**

- `_stable_softmax`: computes softmax in float64 using a sequential cumsum denominator rather than `torch.softmax`. Cross-architecture bit-reproducibility verified Mac CPU/MPS → Linux CPU/CUDA.
- `_quantize_logits_`: rounds logits to a 1/128 grid before softmax. Tiny floating-point differences that don't change the quantised logit produce identical CDFs.
- `_counts_from_pdf`: adds `clamp_min(0)` after the near-integer perturbation step, fixing the negative-count bug at `tau=1.0`.
- `_deterministic_cdf` / `_deterministic_cdf_multi`: integer floor + priority allocation CDF construction at `FP_SCALE=65536` precision. Replaces float-based CDF that was sensitive to platform differences.

**Bitstream version `acv=4` with CRC chunk framing**

- Each model segment is wrapped in `[chunk_len: u32 BE][crc32: u32 BE][payload]`.
- A corrupt chunk is replaced with silence for that segment; the rest of the file decodes normally.
- `tau` is stored in the header so encoder and decoder are always in sync without out-of-band configuration.

**GPU reliability**

- `compress_to_file` detects the model device and moves the waveform there automatically (`wav[None].to(model_device)`). Previously crashed when the model was on MPS or CUDA.
- LM and arithmetic coder always run on CPU for cross-platform determinism regardless of model device.

**Tunable defaults** (via env vars; existing behaviour unchanged if not set):

| Variable | Default |
|---|---|
| `ENCODEC_LM_TAU` | `1.0` |
| `ENCODEC_LOGIT_QSTEP` | `1/128` |
| `ENCODEC_AC_FP_SCALE` | `65536` |
| `ENCODEC_AC_MIN_RANGE` | `1` |
| `ENCODEC_DETERMINISTIC_LM_DTYPE` | `float32` |

### `encodec/model.py`

- `LMModel.forward_logits`: factored out from `forward` so the deterministic and legacy paths share the transformer forward pass.
- `LMModel.forward_legacy`: raw softmax with no quantisation, used for decoding `acv < 3` streams.
- `LMModel.__init__`: accepts `tau` parameter.
- `EncodecModel.get_lm_model`: accepts `device` and `dtype` parameters for explicit LM placement.

### `scripts/`

- `precision_eval.py`: CLI for benchmarking bitrate, SNR, encode/decode wall time, CPU vs MPS, LM vs non-LM, and single-byte corruption behaviour (targets chunk bodies, not headers/CRC).
- `payload_decode_matrix.py`: decodes a payload across CPU and CUDA and compares results; intended for cross-host determinism validation.

## Backwards compatibility

**Reading old streams: fully preserved.** The decoder reads the `acv` field from the stream header and routes accordingly:

| `acv` | Path | Notes |
|---|---|---|
| `0` | Raw bitpacking, no LM | Unchanged |
| `1` / `2` | Legacy LM via `forward_legacy()` | Original `torch.softmax`, no quantisation — decodes exactly as before |
| `4` | New deterministic path | This PR |

**Writing:** `compress(..., use_lm=False)` still produces `acv=0` raw streams identical to before. `compress(..., use_lm=True)` now produces `acv=4`; old decoders will reject `acv=4` streams with an unsupported-version error (the version field exists for this purpose).

**API surface:** no breaking changes. `compress`, `decompress`, `compress_to_file`, `decompress_from_file` retain the same signatures. The `EncodecModel` public API is unchanged.

## Test results

Benchmarked on 7 stereo 48 kHz music tracks, 10 s clips, `encodec_48khz`, all 7 tracks decoded without error on every device:

| Bandwidth | Device | Avg actual kbps | LM gain vs raw | Encode RTF | Decode RTF |
|---|---|---|---|---|---|
| 6 kbps | CPU | 4.34 | 27.7% | 0.26× | 0.27× |
| 6 kbps | MPS | 4.34 | 27.7% | 0.33× | 0.27× |
| 24 kbps | CPU | 19.3 | 19.9% | 0.39× | 0.41× |
| 24 kbps | MPS | 19.3 | 19.9% | 0.47× | 0.40× |

CPU and MPS produce byte-identical payloads and identical decoded audio (same kbps, same SNR). Zero decode failures across all tracks, bandwidths, and devices.

Cross-device decode matrix (payloads encoded on Apple Silicon Mac):

| Encode | Decode | Before | After |
|---|---|---|---|
| Mac CPU | Linux CPU | `EOFError` | ✓ |
| Mac CPU | Linux CUDA | `EOFError` | ✓ |
| Mac MPS | Linux CPU | `EOFError` | ✓ |
| Mac MPS | Linux CUDA | `EOFError` | ✓ |
Loading