Support torch_fused GPTQ quant type by evkotov · Pull Request #36132 · openvinotoolkit/openvino

evkotov · 2026-05-29T11:13:35Z

Details:

gptqmodel auto-selects a TorchFusedQuantLinear backend whose layers report
QUANT_TYPE == "torch_fused" while using the standard 4-bit/int32 GPTQ weight
packing. The GPTQ patcher only accepted a fixed list of quant types, so both
the convert and torch.export paths rejected such models. On the convert path
this surfaced later as "No conversion rule found for operations:
aten::bitwise_right_shift".

Changes:

Add "torch_fused" to supported_quant_types. The same list gates both the
convert and export paths, so OpenVINO's decompression pattern is produced
and the weights fold to a u4 constant as before.
Pin and document the transformers/gptqmodel versions in the LLM model-hub
env so the backend selection does not change silently on dependency updates.
Re-enable the opt_gptq model-hub entries.
Add PyTorch FE layer tests for the convert path (keeps u4, no live
BitwiseRightShift) and the export path.

The aten::bitwise_right_shift translator stays disabled by design: the u4
weight fold relies on it being an unconverted framework node, so no C++ change
is needed.

Tickets:

186343

gptqmodel auto-selects a TorchFusedQuantLinear backend whose layers report QUANT_TYPE == "torch_fused" while using the standard 4-bit/int32 GPTQ weight packing. The GPTQ patcher only accepted a fixed whitelist of quant types, so both the convert and torch.export paths rejected such models — convert then failed downstream with "No conversion rule for aten::bitwise_right_shift". Add "torch_fused" to supported_quant_types; the single whitelist gates both paths, so OpenVINO's decompression pattern is produced and the weights fold to a u4 constant as before. Pin and document the transformers/gptqmodel versions in the LLM model-hub env so the backend selection cannot drift silently, and re-enable the opt_gptq model-hub entries. Add hermetic PyTorch FE layer tests covering the convert (keeps u4, no live BitwiseRightShift) and export paths.

kernels is pulled transitively by transformers but was left unpinned, so it drifted to 0.15.x in CI. kernels>=0.15 made LayerRepository require a version or revision, which transformers 5.5.3's hub_kernels.py constructs without, breaking 'import transformers' and failing every LLM model-hub test. Pin kernels/kernels-data to the validated 0.14.1 so the backend selection cannot drift silently.

Copilot

Pull request overview

This PR extends the PyTorch GPTQ patcher to accept GPTQ layers reporting QUANT_TYPE == "torch_fused" (as produced by newer gptqmodel backend auto-selection), ensuring both the TorchScript convert path and torch.export path generate the expected OpenVINO GPTQ decompression pattern and keep 4-bit packed weights.

Changes:

Add "torch_fused" to the accepted GPTQ supported_quant_types list used by both convert and export patching.
Pin/document LLM model-hub dependencies and re-enable the opt_gptq model-hub entries.
Add frontend regression tests covering TorchScript conversion (keeps i4/u4 Constant, no live BitwiseRightShift) and torch.export patching acceptance.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
`src/bindings/python/src/openvino/frontend/pytorch/gptq.py`	Allows GPTQ patching for `QUANT_TYPE == "torch_fused"` by extending the supported quant type allowlist.
`tests/layer_tests/py_frontend_tests/test_torch_frontend.py`	Adds regression tests for TorchScript convert and `torch.export` patching for `torch_fused` GPTQ fixtures.
`tests/model_hub_tests/pytorch/envs/llm.txt`	Pins/documentation updates to stabilize dependency-driven GPTQ backend selection in LLM model hub tests.
`tests/model_hub_tests/pytorch/test_llm.py`	Re-enables `opt_gptq` in precommit model lists for both convert and export modes (non-ARM).

mvafin

Looks good, please remove extra kernels pin in requirements

kernels==0.14.1 was pinned twice: a pre-existing conditional pin (scoped to x86_64 + python<3.12, for gptqmodel's uncapped kernels dependency) and a newer unconditional pin added because transformers also pulls kernels uncapped and import transformers breaks with kernels>=0.15. The unconditional pin is a strict superset, so the conditional one is redundant. Keep the single unconditional kernels/kernels-data pin and fold the gptqmodel rationale into its comment; remove the redundant conditional line.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

+        def forward(self, x):
+            return torch.zeros(*x.shape[:-1], out_features, dtype=x.dtype)
+


evkotov requested review from bumbosiepsak and mvafin May 29, 2026 11:13

evkotov self-assigned this May 29, 2026

evkotov requested review from a team as code owners May 29, 2026 11:13

evkotov requested a review from cavusmustafa May 29, 2026 11:13

github-actions Bot added category: Python API OpenVINO Python bindings category: PyTorch FE OpenVINO PyTorch Frontend labels May 29, 2026

evkotov added 2 commits June 1, 2026 14:11

evkotov force-pushed the CVS-186343 branch from bc6807c to f9dbbd8 Compare June 1, 2026 12:53

evkotov requested a review from Copilot June 1, 2026 12:53

Copilot started reviewing on behalf of evkotov June 1, 2026 12:54 View session

Copilot AI reviewed Jun 1, 2026

View reviewed changes

Comment thread tests/model_hub_tests/pytorch/envs/llm.txt Outdated

mvafin approved these changes Jun 2, 2026

View reviewed changes

evkotov requested a review from Copilot June 3, 2026 10:33

Copilot started reviewing on behalf of evkotov June 3, 2026 10:33 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Comment thread tests/layer_tests/py_frontend_tests/test_torch_frontend.py

Comment on lines +2357 to +2359

def forward(self, x):

return torch.zeros(*x.shape[:-1], out_features, dtype=x.dtype)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support torch_fused GPTQ quant type#36132

Support torch_fused GPTQ quant type#36132
evkotov wants to merge 3 commits into
openvinotoolkit:masterfrom
evkotov:CVS-186343

evkotov commented May 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

mvafin left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		def forward(self, x):
		return torch.zeros(*x.shape[:-1], out_features, dtype=x.dtype)

Conversation

evkotov commented May 29, 2026

Details:

Tickets:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

mvafin left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants