feat(kompress): HEADROOM_KOMPRESS_BACKEND env + GPU/MPS auto-detect by SwiftWing21 · Pull Request #204 · chopratejas/headroom

SwiftWing21 · 2026-04-19T06:15:50Z

Summary

Closes #202.

Adds HEADROOM_KOMPRESS_BACKEND env var (auto / onnx / pytorch) and
teaches auto mode to prefer the PyTorch backend when CUDA or Apple-Silicon
MPS is available. Previously _load_kompress always tried ONNX first
whenever onnxruntime was importable — which is always true for
headroom-ai[proxy]. This left GPU-equipped users on a CPU-only path.

Selection order in auto (default):

If PyTorch is installed AND (torch.cuda.is_available() OR
Apple-Silicon MPS), prefer PyTorch; fall back to ONNX on failure.
Else, prefer ONNX; fall back to PyTorch on failure.
Raise ImportError if neither is available.

Apple Silicon detection uses
platform.machine() == "arm64" and platform.system() == "Darwin", which
Apple has committed to keeping stable across M-series generations (M1 / M2
/ M3 / M4 / ...).

Invalid values (e.g. HEADROOM_KOMPRESS_BACKEND=tensorflow) log a warning
rather than silently falling back to auto, so misconfiguration is visible.

Behavior on existing deployments

Linux/Windows CPU-only: no change — auto falls through to ONNX
exactly like before.
NVIDIA + PyTorch installed: now auto-selects CUDA via PyTorch.
Apple Silicon + PyTorch installed: now auto-selects MPS.
Anyone can revert HEADROOM_KOMPRESS_BACKEND=onnx
reverts.

Test plan

I don't have Apple or NVIDIA CPU hardware locally. Requesting a
maintainer or community reviewer to run the two manual checks before
merge. Code-level correctness is fully covered by unit tests; wall-clock
speedup is not.

Pre-existing CI state: tests/test_transforms/test_universal_json_crush.py::TestFullPipelineIntegration::test_number_array_via_compress
fails on current `main` (verified via git diff origin/main — this PR
does not touch universal_json_crush.py or its test file). The red CI
on this PR matches what main already shows; not introduced here.

Docs

New `### Kompress backend selection` subsection in `wiki/configuration.md`
covering the env var and the backend comparison table.
`CHANGELOG.md` entry under `[Unreleased]` / `Added`.

Commit structure

Five focused commits on the branch (keeping them separate for easier
review + surgical revert if needed). Squash-on-merge is fine if that
matches house style.

🤖 Generated with Claude Code

Allow explicit onnx/pytorch backend selection via env var, and add auto-detect so PyTorch (with CUDA or Apple-Silicon MPS) is preferred when an accelerator is available. Falls back gracefully on load failure. Refs chopratejas#202.

Review followup — the first override test had misleading fake-torch setup for a code path that short-circuits before accelerator probing. Rename and trim so the test matches what it actually exercises; the auto-detect path is tested in follow-up commits per plan. Also warn when HEADROOM_KOMPRESS_BACKEND is set to an unknown value rather than silently falling back to auto. Refs chopratejas#202.

Four regression tests for _load_kompress auto mode: - Apple Silicon + MPS available → picks PyTorch - CUDA available on any OS → picks PyTorch - No accelerator → picks ONNX (regression guard for legacy behavior) - PyTorch load failure → falls back to ONNX Refs chopratejas#202.

Refs chopratejas#202.

SwiftWing21 · 2026-04-19T20:56:05Z

Let me know if I need to make any changes.

Kompressor is a core part of Helix-Context stack, so having the auto-detect makes it "easier" on end users.
Expecting to have a friend be able to test with DGX Sparks in a week or two to check off Nvidia devices

chopratejas · 2026-04-20T02:10:06Z

Hi,

thanks for making changes - let me test it on my machine (I have a Mac) - and see if I can see the speedup.

SwiftWing21 added 5 commits April 18, 2026 22:52

docs(kompress): document HEADROOM_KOMPRESS_BACKEND env var

81afd45

Refs chopratejas#202.

chore(changelog): note HEADROOM_KOMPRESS_BACKEND addition

0625743

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(kompress): HEADROOM_KOMPRESS_BACKEND env + GPU/MPS auto-detect#204

feat(kompress): HEADROOM_KOMPRESS_BACKEND env + GPU/MPS auto-detect#204
SwiftWing21 wants to merge 5 commits intochopratejas:mainfrom
SwiftWing21:feat/kompress-backend-selection

SwiftWing21 commented Apr 19, 2026 •

edited

Loading

Uh oh!

SwiftWing21 commented Apr 19, 2026

Uh oh!

chopratejas commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

SwiftWing21 commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Behavior on existing deployments

Test plan

Docs

Commit structure

Uh oh!

SwiftWing21 commented Apr 19, 2026

Uh oh!

chopratejas commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SwiftWing21 commented Apr 19, 2026 •

edited

Loading