Skip to content

Support torch_fused GPTQ quant type#36132

Open
evkotov wants to merge 3 commits into
openvinotoolkit:masterfrom
evkotov:CVS-186343
Open

Support torch_fused GPTQ quant type#36132
evkotov wants to merge 3 commits into
openvinotoolkit:masterfrom
evkotov:CVS-186343

Conversation

@evkotov
Copy link
Copy Markdown
Contributor

@evkotov evkotov commented May 29, 2026

Details:

gptqmodel auto-selects a TorchFusedQuantLinear backend whose layers report
QUANT_TYPE == "torch_fused" while using the standard 4-bit/int32 GPTQ weight
packing. The GPTQ patcher only accepted a fixed list of quant types, so both
the convert and torch.export paths rejected such models. On the convert path
this surfaced later as "No conversion rule found for operations:
aten::bitwise_right_shift".

Changes:

  • Add "torch_fused" to supported_quant_types. The same list gates both the
    convert and export paths, so OpenVINO's decompression pattern is produced
    and the weights fold to a u4 constant as before.
  • Pin and document the transformers/gptqmodel versions in the LLM model-hub
    env so the backend selection does not change silently on dependency updates.
  • Re-enable the opt_gptq model-hub entries.
  • Add PyTorch FE layer tests for the convert path (keeps u4, no live
    BitwiseRightShift) and the export path.

The aten::bitwise_right_shift translator stays disabled by design: the u4
weight fold relies on it being an unconverted framework node, so no C++ change
is needed.

Tickets:

  • 186343

@evkotov evkotov requested review from bumbosiepsak and mvafin May 29, 2026 11:13
@evkotov evkotov self-assigned this May 29, 2026
@evkotov evkotov requested review from a team as code owners May 29, 2026 11:13
@evkotov evkotov requested a review from cavusmustafa May 29, 2026 11:13
@github-actions github-actions Bot added category: Python API OpenVINO Python bindings category: PyTorch FE OpenVINO PyTorch Frontend labels May 29, 2026
evkotov added 2 commits June 1, 2026 14:11
gptqmodel auto-selects a TorchFusedQuantLinear backend whose layers report
QUANT_TYPE == "torch_fused" while using the standard 4-bit/int32 GPTQ weight
packing. The GPTQ patcher only accepted a fixed whitelist of quant types, so
both the convert and torch.export paths rejected such models — convert then
failed downstream with "No conversion rule for aten::bitwise_right_shift".

Add "torch_fused" to supported_quant_types; the single whitelist gates both
paths, so OpenVINO's decompression pattern is produced and the weights fold to
a u4 constant as before. Pin and document the transformers/gptqmodel versions
in the LLM model-hub env so the backend selection cannot drift silently, and
re-enable the opt_gptq model-hub entries. Add hermetic PyTorch FE layer tests
covering the convert (keeps u4, no live BitwiseRightShift) and export paths.
kernels is pulled transitively by transformers but was left unpinned, so it
drifted to 0.15.x in CI. kernels>=0.15 made LayerRepository require a version
or revision, which transformers 5.5.3's hub_kernels.py constructs without,
breaking 'import transformers' and failing every LLM model-hub test. Pin
kernels/kernels-data to the validated 0.14.1 so the backend selection cannot
drift silently.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the PyTorch GPTQ patcher to accept GPTQ layers reporting QUANT_TYPE == "torch_fused" (as produced by newer gptqmodel backend auto-selection), ensuring both the TorchScript convert path and torch.export path generate the expected OpenVINO GPTQ decompression pattern and keep 4-bit packed weights.

Changes:

  • Add "torch_fused" to the accepted GPTQ supported_quant_types list used by both convert and export patching.
  • Pin/document LLM model-hub dependencies and re-enable the opt_gptq model-hub entries.
  • Add frontend regression tests covering TorchScript conversion (keeps i4/u4 Constant, no live BitwiseRightShift) and torch.export patching acceptance.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
src/bindings/python/src/openvino/frontend/pytorch/gptq.py Allows GPTQ patching for QUANT_TYPE == "torch_fused" by extending the supported quant type allowlist.
tests/layer_tests/py_frontend_tests/test_torch_frontend.py Adds regression tests for TorchScript convert and torch.export patching for torch_fused GPTQ fixtures.
tests/model_hub_tests/pytorch/envs/llm.txt Pins/documentation updates to stabilize dependency-driven GPTQ backend selection in LLM model hub tests.
tests/model_hub_tests/pytorch/test_llm.py Re-enables opt_gptq in precommit model lists for both convert and export modes (non-ARM).

Comment thread tests/model_hub_tests/pytorch/envs/llm.txt Outdated
Copy link
Copy Markdown
Contributor

@mvafin mvafin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, please remove extra kernels pin in requirements

kernels==0.14.1 was pinned twice: a pre-existing conditional pin (scoped to
x86_64 + python<3.12, for gptqmodel's uncapped kernels dependency) and a newer
unconditional pin added because transformers also pulls kernels uncapped and
import transformers breaks with kernels>=0.15. The unconditional pin is a strict
superset, so the conditional one is redundant.

Keep the single unconditional kernels/kernels-data pin and fold the gptqmodel
rationale into its comment; remove the redundant conditional line.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment on lines +2357 to +2359
def forward(self, x):
return torch.zeros(*x.shape[:-1], out_features, dtype=x.dtype)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: Python API OpenVINO Python bindings category: PyTorch FE OpenVINO PyTorch Frontend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants