feat(kompress): HEADROOM_KOMPRESS_BACKEND env + GPU/MPS auto-detect#204
Open
SwiftWing21 wants to merge 5 commits intochopratejas:mainfrom
Open
feat(kompress): HEADROOM_KOMPRESS_BACKEND env + GPU/MPS auto-detect#204SwiftWing21 wants to merge 5 commits intochopratejas:mainfrom
SwiftWing21 wants to merge 5 commits intochopratejas:mainfrom
Conversation
Allow explicit onnx/pytorch backend selection via env var, and add auto-detect so PyTorch (with CUDA or Apple-Silicon MPS) is preferred when an accelerator is available. Falls back gracefully on load failure. Refs chopratejas#202.
Review followup — the first override test had misleading fake-torch setup for a code path that short-circuits before accelerator probing. Rename and trim so the test matches what it actually exercises; the auto-detect path is tested in follow-up commits per plan. Also warn when HEADROOM_KOMPRESS_BACKEND is set to an unknown value rather than silently falling back to auto. Refs chopratejas#202.
Four regression tests for _load_kompress auto mode: - Apple Silicon + MPS available → picks PyTorch - CUDA available on any OS → picks PyTorch - No accelerator → picks ONNX (regression guard for legacy behavior) - PyTorch load failure → falls back to ONNX Refs chopratejas#202.
Contributor
Author
|
Let me know if I need to make any changes. Kompressor is a core part of Helix-Context stack, so having the auto-detect makes it "easier" on end users. |
Owner
|
Hi, thanks for making changes - let me test it on my machine (I have a Mac) - and see if I can see the speedup. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #202.
Adds
HEADROOM_KOMPRESS_BACKENDenv var (auto/onnx/pytorch) andteaches
automode to prefer the PyTorch backend when CUDA or Apple-SiliconMPS is available. Previously
_load_kompressalways tried ONNX firstwhenever
onnxruntimewas importable — which is always true forheadroom-ai[proxy]. This left GPU-equipped users on a CPU-only path.Selection order in
auto(default):torch.cuda.is_available()ORApple-Silicon MPS), prefer PyTorch; fall back to ONNX on failure.
ImportErrorif neither is available.Apple Silicon detection uses
platform.machine() == "arm64" and platform.system() == "Darwin", whichApple has committed to keeping stable across M-series generations (M1 / M2
/ M3 / M4 / ...).
Invalid values (e.g.
HEADROOM_KOMPRESS_BACKEND=tensorflow) log a warningrather than silently falling back to auto, so misconfiguration is visible.
Behavior on existing deployments
exactly like before.
reverts.
Test plan
confirm speedup from issue [FEATURE] Kompressor Batching Support for Nvidia + Apple M Silicon #202's field data (2206ms avg -> sub-1s
expected)
I don't have Apple or NVIDIA CPU hardware locally. Requesting a
maintainer or community reviewer to run the two manual checks before
merge. Code-level correctness is fully covered by unit tests; wall-clock
speedup is not.
Pre-existing CI state:
tests/test_transforms/test_universal_json_crush.py::TestFullPipelineIntegration::test_number_array_via_compressfails on current `main` (verified via
git diff origin/main— this PRdoes not touch
universal_json_crush.pyor its test file). The red CIon this PR matches what main already shows; not introduced here.
Docs
covering the env var and the backend comparison table.
Commit structure
Five focused commits on the branch (keeping them separate for easier
review + surgical revert if needed). Squash-on-merge is fine if that
matches house style.
🤖 Generated with Claude Code