[GPU] fix dGPU func testcases in smoke_MatMulCompressedWeights_extra_multiply by yuanxion · Pull Request #35442 · openvinotoolkit/openvino

yuanxion · 2026-04-21T09:22:16Z

Details

fixed dGPU functional testcase of smoke_MatMulCompressedWeights_extra_multiply_non_trivial_batch_broadcast_no_convert/MatmulWeightsDecompression.Inference, which is splitted from the original failed testcase of smoke_MatMulCompressedWeights_extra_multiply/MatmulWeightsDecompression.Inference.

Description of the issue

Symptom

The testcase failed on dGPU for compressed-weight MatMul patterns with group_size=2 and extra_multiply=1, especially for cases where the post-reshape multiply made the weights effectively batched again. The failure surfaced during GPU program build (implementation selection) after MatMul had been rewritten to FullyConnected.

Root cause

ConvertMatMulToFullyConnected allowed MatMul -> FullyConnected conversion for an extra_multiply compressed-weights pattern even when the extra-multiply broadcasted the weights back into non-trivial batch dimensions.
For example, the weights may first be reshaped into a shared 2D form such as [16, 32], but the extra multiply can broadcast them to a batched shape such as [8, 16, 32]. That makes the weights effectively per-batch again, so forcing the FullyConnected path is unsafe.

How to fix it

Detect whether the extra-multiply introduces non-trivial batch broadcast in the weights path. If yes, block MatMul -> FullyConnected conversion and keep the original MatMul; otherwise, keep valid compressed FullyConnected handling enabled for supported extra-multiply patterns.

Split the functional coverage by behavior:

smoke_MatMulCompressedWeights_extra_multiply for supported convert cases
smoke_MatMulCompressedWeights_extra_multiply_non_trivial_batch_broadcast_no_convert for blocked no-convert cases

The code and line that caused this issue

smoke_MatMulCompressedWeights_extra_multiply

openvino/src/plugins/intel_gpu/src/plugin/transformations/convert_matmul_to_fc.cpp

Lines 122 to 124 in 6c7a684

    
           } else if (!is_compressed_weight || !supports_immad) { 
        
               return std::make_tuple(false, std::move(shape_a_aligned), std::move(shape_b_aligned)); 
        
           }

Reproduction step and snapshot

before this PR

./ov_gpu_func_tests --device_suffix=1 --gtest_filter=
'smoke_MatMulCompressedWeights_extra_multiply/
MatmulWeightsDecompression.Inference/data_shape=[]_[1.4.16]__weights_shape=[16,32]_group_size=2_
weights_precision=u8_activations_precision=f32_transpose_weights=0_decompression_subtract=0_
reshape_on_decompression=0_extra_multiply=1_per_tensor_zp=0_param_weights=1_dyn_quan_group_size=0'

after this PR

./ov_gpu_func_tests --device_suffix=1 --gtest_filter=
'smoke_MatMulCompressedWeights_extra_multiply_non_trivial_batch_broadcast_no_convert/
MatmulWeightsDecompression.Inference/data_shape=[]_[1.4.16]__weights_shape=[16,32]_group_size=2_
weights_precision=u8_activations_precision=f32_transpose_weights=0_decompression_subtract=0_
reshape_on_decompression=0_extra_multiply=1_per_tensor_zp=0_param_weights=1_dyn_quan_group_size=0'

Problematic graph

N/A

Checklist

Is it a proper fix? (not a workaround)
Did you include test case for this fix, if necessary?
Did you review existing test that can be extended to cover this scenario? Which test did you review?

Tickets:

CVS-182520

AI Assistance:

AI assistance used: yes
Used it to reproduce the failing dGPU function tests, inspect the dumped GPU graphs, find the root cause, and also add testcases.

Copilot

Pull request overview

Fixes an unsafe MatMul -> FullyConnected rewrite in the Intel GPU plugin when using compressed weights with extra_multiply, which could turn shared weights into effectively batched weights and trigger dGPU program-build failures.

Changes:

Add a guard in ConvertMatMulToFullyConnected to block conversion when the post-reshape Multiply introduces non-trivial batch broadcasting in the weights path.
Add unit tests covering (a) safe extra-multiply conversion and (b) the non-trivial batch-broadcast case that must remain MatMul.
Adjust functional test instantiations to cover the extra-multiply scenario and add a dedicated instantiation for the non-trivial batch-broadcast case.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`src/plugins/intel_gpu/src/plugin/transformations/convert_matmul_to_fc.cpp`	Blocks FC conversion when extra post-reshape multiply broadcasts weights into real batch dimensions.
`src/plugins/intel_gpu/tests/unit/transformations/convert_matmul_to_fc_test.cpp`	Adds regression unit tests for extra-multiply patterns (convert vs no-convert).
`src/plugins/intel_gpu/tests/functional/subgraph_tests/dynamic/matmul_weights_decompression.cpp`	Updates/extends smoke instantiations to cover extra-multiply and the non-trivial batch broadcast scenario.

yuanxion · 2026-05-12T09:05:31Z

No regression from onepunch tests about wwb and perf for LLM models with this PR.

// onepunch results for iGPU & dGPU:
https://ci-dlbenchmark-icv.iotg.sclab.intel.com/job/DL-Benchmark/job/dev/job/dev_trigger/1226/
https://ci-dlbenchmark-icv.iotg.sclab.intel.com/job/DL-Benchmark/job/dev/job/dev_trigger/1227/

wilson-seok · 2026-05-15T02:05:35Z

@yuanxion Is there any possibility to affect to existing models? If yes, then we need to check performance regression.

…ession dGPU func testcase Signed-off-by: yuan.xiong <yuan.xiong@intel.com>

Signed-off-by: yuan.xiong <yuan.xiong@intel.com>

yuanxion requested review from a team as code owners April 21, 2026 09:22

yuanxion requested review from Copilot and removed request for a team April 21, 2026 09:22

github-actions Bot added the category: GPU OpenVINO GPU plugin label Apr 21, 2026

Copilot started reviewing on behalf of yuanxion April 21, 2026 09:23 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Comment thread src/plugins/intel_gpu/tests/functional/subgraph_tests/dynamic/matmul_weights_decompression.cpp

Comment thread src/plugins/intel_gpu/tests/functional/subgraph_tests/dynamic/matmul_weights_decompression.cpp

rkazants requested review from a team April 22, 2026 05:07

yuanxion mentioned this pull request May 12, 2026

[GPU] fix dGPU func testcases in smoke_ScaledAttnDynamic4D_GPU and smoke_MatMulCompressedWeights_extra_multiply #35343

Closed

3 tasks

yuanxion force-pushed the fix-ci-dgpu-tests-gha-matmul-extra-multiply branch 2 times, most recently from cfab066 to 75e0dbc Compare May 25, 2026 08:15

yuanxion force-pushed the fix-ci-dgpu-tests-gha-matmul-extra-multiply branch from 75e0dbc to 0661cd4 Compare June 1, 2026 02:58

yuanxion added 2 commits June 3, 2026 15:59

fix smoke_MatMulCompressedWeights_extra_multiply/MatmulWeightsDecompr…

4814d4e

…ession dGPU func testcase Signed-off-by: yuan.xiong <yuan.xiong@intel.com>

refine testcases

a0cef9b

Signed-off-by: yuan.xiong <yuan.xiong@intel.com>

yuanxion force-pushed the fix-ci-dgpu-tests-gha-matmul-extra-multiply branch from 0661cd4 to a0cef9b Compare June 3, 2026 08:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] fix dGPU func testcases in smoke_MatMulCompressedWeights_extra_multiply#35442

[GPU] fix dGPU func testcases in smoke_MatMulCompressedWeights_extra_multiply#35442
yuanxion wants to merge 2 commits into
openvinotoolkit:masterfrom
yuanxion:fix-ci-dgpu-tests-gha-matmul-extra-multiply

yuanxion commented Apr 21, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

yuanxion commented May 12, 2026

Uh oh!

wilson-seok commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	} else if (!is_compressed_weight \|\| !supports_immad) {
	return std::make_tuple(false, std::move(shape_a_aligned), std::move(shape_b_aligned));
	}

Conversation

yuanxion commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details

Description of the issue

Symptom

Root cause

How to fix it

The code and line that caused this issue

Reproduction step and snapshot

Problematic graph

Checklist

Tickets:

AI Assistance:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

yuanxion commented May 12, 2026

Uh oh!

wilson-seok commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yuanxion commented Apr 21, 2026 •

edited

Loading