[GPU] fix dGPU func testcases in smoke_MatMulCompressedWeights_extra_multiply#35442
Open
yuanxion wants to merge 2 commits into
Open
[GPU] fix dGPU func testcases in smoke_MatMulCompressedWeights_extra_multiply#35442yuanxion wants to merge 2 commits into
yuanxion wants to merge 2 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes an unsafe MatMul -> FullyConnected rewrite in the Intel GPU plugin when using compressed weights with extra_multiply, which could turn shared weights into effectively batched weights and trigger dGPU program-build failures.
Changes:
- Add a guard in
ConvertMatMulToFullyConnectedto block conversion when the post-reshapeMultiplyintroduces non-trivial batch broadcasting in the weights path. - Add unit tests covering (a) safe extra-multiply conversion and (b) the non-trivial batch-broadcast case that must remain
MatMul. - Adjust functional test instantiations to cover the extra-multiply scenario and add a dedicated instantiation for the non-trivial batch-broadcast case.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/plugins/intel_gpu/src/plugin/transformations/convert_matmul_to_fc.cpp |
Blocks FC conversion when extra post-reshape multiply broadcasts weights into real batch dimensions. |
src/plugins/intel_gpu/tests/unit/transformations/convert_matmul_to_fc_test.cpp |
Adds regression unit tests for extra-multiply patterns (convert vs no-convert). |
src/plugins/intel_gpu/tests/functional/subgraph_tests/dynamic/matmul_weights_decompression.cpp |
Updates/extends smoke instantiations to cover extra-multiply and the non-trivial batch broadcast scenario. |
Contributor
Author
|
No regression from onepunch tests about wwb and perf for LLM models with this PR. |
3 tasks
Contributor
|
@yuanxion Is there any possibility to affect to existing models? If yes, then we need to check performance regression. |
cfab066 to
75e0dbc
Compare
75e0dbc to
0661cd4
Compare
…ession dGPU func testcase Signed-off-by: yuan.xiong <yuan.xiong@intel.com>
Signed-off-by: yuan.xiong <yuan.xiong@intel.com>
0661cd4 to
a0cef9b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Details
fixed dGPU functional testcase of
smoke_MatMulCompressedWeights_extra_multiply_non_trivial_batch_broadcast_no_convert/MatmulWeightsDecompression.Inference, which is splitted from the original failed testcase ofsmoke_MatMulCompressedWeights_extra_multiply/MatmulWeightsDecompression.Inference.Description of the issue
Symptom
The testcase failed on dGPU for compressed-weight MatMul patterns with group_size=2 and extra_multiply=1, especially for cases where the post-reshape multiply made the weights effectively batched again. The failure surfaced during GPU program build (implementation selection) after MatMul had been rewritten to FullyConnected.
Root cause
ConvertMatMulToFullyConnected allowed MatMul -> FullyConnected conversion for an extra_multiply compressed-weights pattern even when the extra-multiply broadcasted the weights back into non-trivial batch dimensions.
For example, the weights may first be reshaped into a shared 2D form such as [16, 32], but the extra multiply can broadcast them to a batched shape such as [8, 16, 32]. That makes the weights effectively per-batch again, so forcing the FullyConnected path is unsafe.
How to fix it
Detect whether the extra-multiply introduces non-trivial batch broadcast in the weights path. If yes, block MatMul -> FullyConnected conversion and keep the original MatMul; otherwise, keep valid compressed FullyConnected handling enabled for supported extra-multiply patterns.
Split the functional coverage by behavior:
smoke_MatMulCompressedWeights_extra_multiplyfor supported convert casessmoke_MatMulCompressedWeights_extra_multiply_non_trivial_batch_broadcast_no_convertfor blocked no-convert casesThe code and line that caused this issue
smoke_MatMulCompressedWeights_extra_multiply
openvino/src/plugins/intel_gpu/src/plugin/transformations/convert_matmul_to_fc.cpp
Lines 122 to 124 in 6c7a684
Reproduction step and snapshot
Problematic graph
N/A
Checklist
Tickets:
AI Assistance: