Curated subset of #90 + K-side norm accounting fix by TheTom · Pull Request #91 · TheTom/turboquant_plus

TheTom · 2026-05-09T15:39:13Z

Selectively merging from @brosequist's #90 — keeping the actual fixes, deferring API surface, plus an additional K-side fix that #90 missed.

Thanks @brosequist for the bundle — review surface was much easier this way. Re-bundling further per the curation below.

What's included

commit	author	what
`b75813b`	@brosequist	fix: V-norm in `memory_stats`, SeedSequence PRNG, `TurboQuantMSE.compressed_size_bits`
`1074625`	@brosequist	test: QJL regression-guard in `test_turboquant_improves_over_polarquant`
`f23570e`	@brosequist	test: correctness + round-trip tests for fast rotation functions
`3e37572`	@brosequist	chore: ruff config in pyproject.toml
`0ca5bcc`	@brosequist	ci: drop lint workflow (kept ruff config only)
`8afc4bf`	@TheTom	fix: K-side norm accounting — count vector_norm AND residual_norm

Brett's first commit was split — credit preserved

The original 0fd5de9 bundle (5 changes) was cherry-picked with the streaming + serialization API deferred. What's kept:

✅ V-norm fix in KVCacheCompressor.memory_stats()
✅ TurboQuantMSE.compressed_size_bits() (was missing; TurboQuant already had it)
✅ SeedSequence.spawn(2) replacing seed + 1000 magic offset

What's deferred (no caller yet, want to design for the production integration):

KVCacheCompressor.compress_token() / get_compressed_cache() streaming API
CompressedVector.to_bytes() / from_bytes() binary serialization
CompressedKVCache.save() / load() npz serialization

The split commit retains @brosequist as author.

What's added

8afc4bf is a parallel fix to the V-norm fix in b75813b. TurboQuant.CompressedVector stores two float32 norms (vector_norms = ||x||_2 and residual_norms = ||residual||_2), but TurboQuant.compressed_size_bits and KVCacheCompressor.memory_stats only counted one. V uses TurboQuantMSE (single norm — 32 is correct). K uses full TurboQuant (two norms — 64 is correct).

Numerical effect (verified live):

TurboQuant(d=128, b=3).compression_ratio(): 4.92× → 4.57× (true)
KVCacheCompressor(d=128, k=3, v=3).memory_stats(...)['compression_ratio']: ~2.46× → 2.37× (true)

No quantization-output changes — accounting only.

What's not included from #90

feat: add calibrate() to OutlierTurboQuant — OutlierTurboQuant is a deprecated path per docs/turboquant-plus-experiments.md ("Outlier channeling doesn't work… kurtosis stays 8-50… WHT rotation gets it to 2.9"). Calibrate code is well-written, just on a dead module.
docs: HIP/AMD NaN warning — root-cause story ("large K norms → NaN") contradicts docs/papers/asymmetric-kv-compression.md:218, which finds extreme K norms compress better (more Gaussian after normalization). Real cause is HIP-kernel-specific. Will revisit after kernel triage.

Consistency with `docs/papers/why-mse-fails-for-kv-quantization.md`

The new MSE paper argues MSE is a broken proxy for K cache quantization in deployment because attention is non-linear and sparse. Brett's QJL regression-guard test (1074625) measures inner-product distortion on synthetic Gaussian pairs (d=256) — the linear-operator regime where the new paper explicitly says IP/MSE does proxy quality (alongside RaBitQ-style top-k IP search). The test is a regression-guard for IP distortion only; the production decision to drop QJL is justified separately by docs/papers/turbo4-resurrection.md's PPL ablation. No conflict.

Test plan

pytest tests/ refract/tests/ — 982 passed, 1 skipped (7 fewer than Bundled fixes & tests: V-norm accounting, OutlierTurboQuant.calibrate, rotation tests, ruff CI, HIP/AMD NaN docs #90 baseline because streaming/serialization tests deferred with their code)
Smoke-tested TurboQuant.compression_ratio() and KVCacheCompressor.memory_stats() numerics against expected values

🤖 Generated with Claude Code

@brosequist

Subset of @brosequist's #90 commit 0fd5de9 — keeping the actual fixes, deferring the streaming + serialization API surface until a production caller exists. Included: - KVCacheCompressor.memory_stats() was omitting the float32 norm stored per V vector, inflating reported compression ratio. Adds v_bits_total += n_vectors * 32. - TurboQuantMSE.compressed_size_bits() — was missing (TurboQuant already had it). - Replaces seed + 1000 magic offset with np.random.SeedSequence(seed).spawn(2) for true PRNG independence between PolarQuant and QJL stages, and between K and V quantizers. Deferred (not in this commit): - compress_token() / get_compressed_cache() streaming API - CompressedVector.to_bytes() / from_bytes() binary serialization - CompressedKVCache.save() / load() npz serialization

…uant The existing test ended with a print() and no assertion, silently allowing QJL to be worse than PolarQuant. This updates the test to assert the known finding: QJL (TurboQuant 2-bit) is actively worse than MSE-only PolarQuant at the same bit budget. The assertion will alert if QJL is ever fixed and starts winning, prompting re-evaluation of the production path. See turbo4-resurrection.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

TestFastRotationExtended covers: round-trip invertibility (x → rotate → unrotate = x), batch vs single-vector consistency, and energy distribution uniformity after rotation. All three property tests were previously untested. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds a [tool.ruff] section to pyproject.toml (line-length=120, E/W/F rules, ignoring E501/E741) and a GitHub Actions workflow (.github/workflows/lint.yml) that runs ruff check on every push and pull request. Replaces ad-hoc style discussions with an enforced, zero-config lint gate. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The lint workflow added in 46efe26 ran 'ruff check .' against the whole repo and failed immediately because the existing codebase has 233 pre-existing ruff violations (78 F401 unused imports, 68 I001 import sorting, 40 F541 empty f-strings, 32 F841 unused vars, etc.) across benchmarks/ and scripts/. Adding a CI gate that the legacy code doesn't pass is unhelpful, so remove .github/workflows/lint.yml. Keep the [tool.ruff] block in pyproject.toml as opt-in documentation: anyone running 'ruff check' locally still gets the configured rules, and the workflow can be re-enabled later once the legacy violations are addressed (most are auto-fixable via 'ruff check --fix' across 187 of the 233).

TurboQuant.CompressedVector stores TWO float32 norms per vector (vector_norms = ||x||_2, residual_norms = ||residual||_2), but compressed_size_bits and KVCacheCompressor.memory_stats only counted one (32 bits instead of 64). Pre-existing on main, parallel to the V-side undercount fixed in the previous commit. V uses TurboQuantMSE which stores a single norm — 32 is correct there. K uses full TurboQuant which stores two norms. Effect: K compressed size was understated by 32 bits per vector, inflating reported compression ratio. With d=128 b=3 the TurboQuant ratio drops from 4.92× → 4.57× (true value), and the combined KV ratio at d=128 k=v=3 drops from ~2.46× → ~2.37×. No quantization-output changes, accounting only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

brett and others added 6 commits May 9, 2026 10:42

TheTom force-pushed the ship/pr-90-curated branch from 20daf78 to 8afc4bf Compare May 9, 2026 15:46

TheTom mentioned this pull request May 9, 2026

Bundled fixes & tests: V-norm accounting, OutlierTurboQuant.calibrate, rotation tests, ruff CI, HIP/AMD NaN docs #90

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Curated subset of #90 + K-side norm accounting fix#91

Curated subset of #90 + K-side norm accounting fix#91
TheTom wants to merge 6 commits intomainfrom
ship/pr-90-curated

TheTom commented May 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

TheTom commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's included

Brett's first commit was split — credit preserved

What's added

What's not included from #90

Consistency with docs/papers/why-mse-fails-for-kv-quantization.md

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TheTom commented May 9, 2026 •

edited

Loading

Consistency with `docs/papers/why-mse-fails-for-kv-quantization.md`