Skip to content

[GPU] fix dGPU func testcases in smoke_MatMulCompressedWeights_extra_multiply#35442

Open
yuanxion wants to merge 2 commits into
openvinotoolkit:masterfrom
yuanxion:fix-ci-dgpu-tests-gha-matmul-extra-multiply
Open

[GPU] fix dGPU func testcases in smoke_MatMulCompressedWeights_extra_multiply#35442
yuanxion wants to merge 2 commits into
openvinotoolkit:masterfrom
yuanxion:fix-ci-dgpu-tests-gha-matmul-extra-multiply

Conversation

@yuanxion
Copy link
Copy Markdown
Contributor

@yuanxion yuanxion commented Apr 21, 2026

Details

fixed dGPU functional testcase of smoke_MatMulCompressedWeights_extra_multiply_non_trivial_batch_broadcast_no_convert/MatmulWeightsDecompression.Inference, which is splitted from the original failed testcase of smoke_MatMulCompressedWeights_extra_multiply/MatmulWeightsDecompression.Inference.

Description of the issue

Symptom

The testcase failed on dGPU for compressed-weight MatMul patterns with group_size=2 and extra_multiply=1, especially for cases where the post-reshape multiply made the weights effectively batched again. The failure surfaced during GPU program build (implementation selection) after MatMul had been rewritten to FullyConnected.

Root cause

ConvertMatMulToFullyConnected allowed MatMul -> FullyConnected conversion for an extra_multiply compressed-weights pattern even when the extra-multiply broadcasted the weights back into non-trivial batch dimensions.
For example, the weights may first be reshaped into a shared 2D form such as [16, 32], but the extra multiply can broadcast them to a batched shape such as [8, 16, 32]. That makes the weights effectively per-batch again, so forcing the FullyConnected path is unsafe.

How to fix it

Detect whether the extra-multiply introduces non-trivial batch broadcast in the weights path. If yes, block MatMul -> FullyConnected conversion and keep the original MatMul; otherwise, keep valid compressed FullyConnected handling enabled for supported extra-multiply patterns.

Split the functional coverage by behavior:

  • smoke_MatMulCompressedWeights_extra_multiply for supported convert cases
  • smoke_MatMulCompressedWeights_extra_multiply_non_trivial_batch_broadcast_no_convert for blocked no-convert cases

The code and line that caused this issue

smoke_MatMulCompressedWeights_extra_multiply

} else if (!is_compressed_weight || !supports_immad) {
return std::make_tuple(false, std::move(shape_a_aligned), std::move(shape_b_aligned));
}

Reproduction step and snapshot

  • before this PR
./ov_gpu_func_tests --device_suffix=1 --gtest_filter=
'smoke_MatMulCompressedWeights_extra_multiply/
MatmulWeightsDecompression.Inference/data_shape=[]_[1.4.16]__weights_shape=[16,32]_group_size=2_
weights_precision=u8_activations_precision=f32_transpose_weights=0_decompression_subtract=0_
reshape_on_decompression=0_extra_multiply=1_per_tensor_zp=0_param_weights=1_dyn_quan_group_size=0'
  • after this PR
./ov_gpu_func_tests --device_suffix=1 --gtest_filter=
'smoke_MatMulCompressedWeights_extra_multiply_non_trivial_batch_broadcast_no_convert/
MatmulWeightsDecompression.Inference/data_shape=[]_[1.4.16]__weights_shape=[16,32]_group_size=2_
weights_precision=u8_activations_precision=f32_transpose_weights=0_decompression_subtract=0_
reshape_on_decompression=0_extra_multiply=1_per_tensor_zp=0_param_weights=1_dyn_quan_group_size=0'

Problematic graph

N/A

Checklist

  • Is it a proper fix? (not a workaround)
  • Did you include test case for this fix, if necessary?
  • Did you review existing test that can be extended to cover this scenario? Which test did you review?

Tickets:

AI Assistance:

  • AI assistance used: yes
  • Used it to reproduce the failing dGPU function tests, inspect the dumped GPU graphs, find the root cause, and also add testcases.

@yuanxion yuanxion requested review from a team as code owners April 21, 2026 09:22
@yuanxion yuanxion requested review from Copilot and removed request for a team April 21, 2026 09:22
@github-actions github-actions Bot added the category: GPU OpenVINO GPU plugin label Apr 21, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an unsafe MatMul -> FullyConnected rewrite in the Intel GPU plugin when using compressed weights with extra_multiply, which could turn shared weights into effectively batched weights and trigger dGPU program-build failures.

Changes:

  • Add a guard in ConvertMatMulToFullyConnected to block conversion when the post-reshape Multiply introduces non-trivial batch broadcasting in the weights path.
  • Add unit tests covering (a) safe extra-multiply conversion and (b) the non-trivial batch-broadcast case that must remain MatMul.
  • Adjust functional test instantiations to cover the extra-multiply scenario and add a dedicated instantiation for the non-trivial batch-broadcast case.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/plugins/intel_gpu/src/plugin/transformations/convert_matmul_to_fc.cpp Blocks FC conversion when extra post-reshape multiply broadcasts weights into real batch dimensions.
src/plugins/intel_gpu/tests/unit/transformations/convert_matmul_to_fc_test.cpp Adds regression unit tests for extra-multiply patterns (convert vs no-convert).
src/plugins/intel_gpu/tests/functional/subgraph_tests/dynamic/matmul_weights_decompression.cpp Updates/extends smoke instantiations to cover extra-multiply and the non-trivial batch broadcast scenario.

@rkazants rkazants requested review from a team April 22, 2026 05:07
@yuanxion
Copy link
Copy Markdown
Contributor Author

No regression from onepunch tests about wwb and perf for LLM models with this PR.

// onepunch results for iGPU & dGPU:
https://ci-dlbenchmark-icv.iotg.sclab.intel.com/job/DL-Benchmark/job/dev/job/dev_trigger/1226/
https://ci-dlbenchmark-icv.iotg.sclab.intel.com/job/DL-Benchmark/job/dev/job/dev_trigger/1227/

@wilson-seok
Copy link
Copy Markdown
Contributor

@yuanxion Is there any possibility to affect to existing models? If yes, then we need to check performance regression.

@yuanxion yuanxion force-pushed the fix-ci-dgpu-tests-gha-matmul-extra-multiply branch 2 times, most recently from cfab066 to 75e0dbc Compare May 25, 2026 08:15
@yuanxion yuanxion force-pushed the fix-ci-dgpu-tests-gha-matmul-extra-multiply branch from 75e0dbc to 0661cd4 Compare June 1, 2026 02:58
yuanxion added 2 commits June 3, 2026 15:59
…ession dGPU func testcase

Signed-off-by: yuan.xiong <yuan.xiong@intel.com>
Signed-off-by: yuan.xiong <yuan.xiong@intel.com>
@yuanxion yuanxion force-pushed the fix-ci-dgpu-tests-gha-matmul-extra-multiply branch from 0661cd4 to a0cef9b Compare June 3, 2026 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants