Support torch_fused GPTQ quant type#36132
Open
evkotov wants to merge 3 commits into
Open
Conversation
gptqmodel auto-selects a TorchFusedQuantLinear backend whose layers report QUANT_TYPE == "torch_fused" while using the standard 4-bit/int32 GPTQ weight packing. The GPTQ patcher only accepted a fixed whitelist of quant types, so both the convert and torch.export paths rejected such models — convert then failed downstream with "No conversion rule for aten::bitwise_right_shift". Add "torch_fused" to supported_quant_types; the single whitelist gates both paths, so OpenVINO's decompression pattern is produced and the weights fold to a u4 constant as before. Pin and document the transformers/gptqmodel versions in the LLM model-hub env so the backend selection cannot drift silently, and re-enable the opt_gptq model-hub entries. Add hermetic PyTorch FE layer tests covering the convert (keeps u4, no live BitwiseRightShift) and export paths.
kernels is pulled transitively by transformers but was left unpinned, so it drifted to 0.15.x in CI. kernels>=0.15 made LayerRepository require a version or revision, which transformers 5.5.3's hub_kernels.py constructs without, breaking 'import transformers' and failing every LLM model-hub test. Pin kernels/kernels-data to the validated 0.14.1 so the backend selection cannot drift silently.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR extends the PyTorch GPTQ patcher to accept GPTQ layers reporting QUANT_TYPE == "torch_fused" (as produced by newer gptqmodel backend auto-selection), ensuring both the TorchScript convert path and torch.export path generate the expected OpenVINO GPTQ decompression pattern and keep 4-bit packed weights.
Changes:
- Add
"torch_fused"to the accepted GPTQsupported_quant_typeslist used by both convert and export patching. - Pin/document LLM model-hub dependencies and re-enable the
opt_gptqmodel-hub entries. - Add frontend regression tests covering TorchScript conversion (keeps i4/u4 Constant, no live
BitwiseRightShift) andtorch.exportpatching acceptance.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/bindings/python/src/openvino/frontend/pytorch/gptq.py |
Allows GPTQ patching for QUANT_TYPE == "torch_fused" by extending the supported quant type allowlist. |
tests/layer_tests/py_frontend_tests/test_torch_frontend.py |
Adds regression tests for TorchScript convert and torch.export patching for torch_fused GPTQ fixtures. |
tests/model_hub_tests/pytorch/envs/llm.txt |
Pins/documentation updates to stabilize dependency-driven GPTQ backend selection in LLM model hub tests. |
tests/model_hub_tests/pytorch/test_llm.py |
Re-enables opt_gptq in precommit model lists for both convert and export modes (non-ARM). |
mvafin
approved these changes
Jun 2, 2026
Contributor
mvafin
left a comment
There was a problem hiding this comment.
Looks good, please remove extra kernels pin in requirements
kernels==0.14.1 was pinned twice: a pre-existing conditional pin (scoped to x86_64 + python<3.12, for gptqmodel's uncapped kernels dependency) and a newer unconditional pin added because transformers also pulls kernels uncapped and import transformers breaks with kernels>=0.15. The unconditional pin is a strict superset, so the conditional one is redundant. Keep the single unconditional kernels/kernels-data pin and fold the gptqmodel rationale into its comment; remove the redundant conditional line.
Comment on lines
+2357
to
+2359
| def forward(self, x): | ||
| return torch.zeros(*x.shape[:-1], out_features, dtype=x.dtype) | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Details:
gptqmodel auto-selects a TorchFusedQuantLinear backend whose layers report
QUANT_TYPE == "torch_fused" while using the standard 4-bit/int32 GPTQ weight
packing. The GPTQ patcher only accepted a fixed list of quant types, so both
the convert and torch.export paths rejected such models. On the convert path
this surfaced later as "No conversion rule found for operations:
aten::bitwise_right_shift".
Changes:
convert and export paths, so OpenVINO's decompression pattern is produced
and the weights fold to a u4 constant as before.
env so the backend selection does not change silently on dependency updates.
BitwiseRightShift) and the export path.
The aten::bitwise_right_shift translator stays disabled by design: the u4
weight fold relies on it being an unconverted framework node, so no C++ change
is needed.
Tickets: