[GPU] fix dGPU func testcases in smoke_ScaledAttnDynamic4D_GPU and smoke_MatMulCompressedWeights_extra_multiply by yuanxion · Pull Request #35343 · openvinotoolkit/openvino

yuanxion · 2026-04-15T03:21:53Z

Details

fixes 2 Intel dGPU functional testcases:

smoke_ScaledAttnDynamic4D_GPU/ScaledAttnLayerGPUTest.CompareWithRefs
smoke_MatMulCompressedWeights_extra_multiply/MatmulWeightsDecompression.Inference

Description of the issue

Symptom

smoke_ScaledAttnDynamic4D_GPU/ScaledAttnLayerGPUTest.CompareWithRefs failed on Intel dGPU when the graph carried a scalar or rank-1 placeholder input in the attention-mask slot. The SDPA OCL path treated that placeholder as a real runtime attention mask, which changed kernel configuration and input binding unexpectedly.
smoke_MatMulCompressedWeights_extra_multiply/MatmulWeightsDecompression.Inference failed on dGPU for the group_size=2 + extra_multiply=1 + param_weights=1 configuration. The runtime failure surfaced during GPU program build / implementation selection.

Root cause

The SDPA GPU path only checked whether an attention-mask input slot existed, but did not distinguish a real runtime mask tensor from a scalar / rank-1 placeholder. As a result, placeholder inputs were propagated through the real attention-mask path and affected JIT constants, kernel arguments, and execution logic.
ConvertMatMulToFullyConnected converted a MatMul with parameter-based compressed weights into FullyConnected even when the decompressed weights still had non-trivial batch dimensions. That conversion is unsafe for this pattern: after extra multiply and transpose, the weights remained effectively per-batch / 3D, but were still fed into the FC path.

How to fix it

Add a shared helper to identify whether the attention-mask input is a real runtime mask. Scalar and rank-1 placeholders are now excluded from the runtime attention-mask path, and the same logic is reused by the SDPA OCL implementations.
Detect parameter-based compressed weights explicitly in the matcher result and block MatMul -> FullyConnected conversion when the aligned weight shape still contains non-1 batch dimensions. In that case, keep the original MatMul path instead of forcing FC.

The code and line that caused this issue

smoke_ScaledAttnDynamic4D_GPU

openvino/src/plugins/intel_gpu/src/graph/impls/ocl_v2/sdpa/sdpa_gen_opt.cpp

Lines 126 to 127 in 6c7a684

if (i == attn_mask_idx && desc->attn_mask_val.has_value())

continue;

smoke_MatMulCompressedWeights_extra_multiply

openvino/src/plugins/intel_gpu/src/plugin/transformations/convert_matmul_to_fc.cpp

Lines 122 to 124 in 6c7a684

    
           } else if (!is_compressed_weight || !supports_immad) { 
        
               return std::make_tuple(false, std::move(shape_a_aligned), std::move(shape_b_aligned)); 
        
           }

Reproduction step and snapshot

smoke_ScaledAttnDynamic4D_GPU
./ov_gpu_func_tests --device_suffix=1 --gtest_filter='smoke_ScaledAttnDynamic4D_GPU/ScaledAttnLayerGPUTest.CompareWithRefs/netPRC=f16_IS=[?.5.?.128]_[?.5.?.128]_[?.5.?.32]_[?.1.?.?]_TS=(2.5.100.128)_(2.5.1.128)_(2.5.387.128)_(2.5.100.128)_(2.5.1.128)_(2.5.387.128)_(2.5.100.32)_(2.5.1.32)_(2.5.387.32)_(1.1.100.100)_(1.1.1.1)_(2.1.387.387)_is_causal=0_has_attn=0_is_attn_const=1_has_scale=1_is_scale_const=1_with_transpose0_has_sink=0_'
smoke_MatMulCompressedWeights_extra_multiply
./ov_gpu_func_tests --device_suffix=1 --gtest_filter='smoke_MatMulCompressedWeights_extra_multiply/MatmulWeightsDecompression.Inference/data_shape=[]_[1.4.16]__weights_shape=[16,32]_group_size=2_weights_precision=u8_activations_precision=f32_transpose_weights=0_decompression_subtract=0_reshape_on_decompression=0_extra_multiply=1_per_tensor_zp=0_param_weights=1_dyn_quan_group_size=0'

Problematic graph

N/A

Checklist

Is it a proper fix? (not a workaround)
Did you include test case for this fix, if necessary?
Did you review existing test that can be extended to cover this scenario? Which test did you review?

Tickets:

CVS-182520

AI Assistance:

AI assistance used: yes
Used it to reproduce the failing dGPU function tests, inspect the dumped GPU graphs, find the root cause, and also add testcases.

Copilot

Pull request overview

Fixes two Intel dGPU functional test failures by (1) treating scalar/rank-1 SDPA attention-mask inputs as placeholders (not real runtime masks) in OCL SDPA implementations, and (2) preventing unsafe MatMul→FullyConnected conversion for parameter-based compressed weights that still carry non-trivial batch dimensions.

Changes:

Add shared SDPA helper to detect whether the attention-mask input is a real runtime mask, and reuse it across SDPA OCL implementations.
Update MatMul→FC transformation to block conversion for parameter-based compressed weights with non-1 aligned batch dimensions.
Add/extend unit tests for SDPA placeholder-mask behavior and MatMul→FC “no-convert” scenario.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
src/plugins/intel_gpu/tests/unit/transformations/convert_matmul_to_fc_test.cpp	Adds a regression unit test ensuring MatMul→FC does not occur for parameter-based compressed weights with per-batch dimensions.
src/plugins/intel_gpu/tests/unit/test_cases/sdpa_gpu_test.cpp	Adds a unit test asserting scalar placeholder mask behaves like “no runtime mask” in SDPA OCL path.
src/plugins/intel_gpu/src/plugin/transformations/convert_matmul_to_fc.cpp	Extends aligned-shape check to detect parameter-based compressed weights and block unsafe FC conversion.
src/plugins/intel_gpu/src/graph/impls/ocl_v2/sdpa/sdpa_utils.hpp	Introduces `has_runtime_attn_mask_input()` helper to filter out scalar/1D placeholder masks.
src/plugins/intel_gpu/src/graph/impls/ocl_v2/sdpa/sdpa_ref.cpp	Uses the helper to set JIT constants and to skip binding placeholder mask inputs.
src/plugins/intel_gpu/src/graph/impls/ocl_v2/sdpa/sdpa_gen_opt.cpp	Uses the helper for JIT config and argument binding in the optimized SDPA generator.
src/plugins/intel_gpu/src/graph/impls/ocl_v2/sdpa/sdpa_gen_micro.cpp	Uses the helper to control mask-related JIT constants/args for the micro-kernel generator.

…efs func test cases Signed-off-by: yuan.xiong <yuan.xiong@intel.com>

…ession.Inference func test cases Signed-off-by: yuan.xiong <yuan.xiong@intel.com>

Signed-off-by: yuan.xiong <yuan.xiong@intel.com>

yuanxion · 2026-05-12T09:11:49Z

This PR is split into 2 PRs:
#35437 for smoke_ScaledAttnDynamic4D_GPU
#35442 for smoke_MatMulCompressedWeights_extra_multiply

No needed anymore, close it.

yuanxion requested review from a team as code owners April 15, 2026 03:21

github-actions Bot added the category: GPU OpenVINO GPU plugin label Apr 15, 2026

yuanxion requested a review from Copilot April 15, 2026 09:23

Copilot started reviewing on behalf of yuanxion April 15, 2026 09:24 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

Comment thread src/plugins/intel_gpu/tests/unit/test_cases/sdpa_gpu_test.cpp Outdated

yuanxion added 8 commits April 21, 2026 09:04

Fix smoke_ScaledAttnDynamic4D_GPU/ScaledAttnLayerGPUTest.CompareWithR…

75642e1

…efs func test cases Signed-off-by: yuan.xiong <yuan.xiong@intel.com>

fix smoke_MatMulCompressedWeights_extra_multiply/MatmulWeightsDecompr…

fee4e9d

…ession.Inference func test cases Signed-off-by: yuan.xiong <yuan.xiong@intel.com>

revert unnecessary changes

2eaa8ad

Signed-off-by: yuan.xiong <yuan.xiong@intel.com>

small fix for code style

b8f746b

Signed-off-by: yuan.xiong <yuan.xiong@intel.com>

fix custom sdpa gpu test

d69803d

Signed-off-by: yuan.xiong <yuan.xiong@intel.com>

avoid out-of-bound access

376639c

Signed-off-by: yuan.xiong <yuan.xiong@intel.com>

split extra multiply compressed matmul coverage

d71510a

revert parameter-weight FC conversion guard

996a20e

yuanxion force-pushed the fix-ci-dgpu-tests-gha-scale-atten2 branch from a68d87a to 996a20e Compare April 21, 2026 01:09

yuanxion marked this pull request as draft April 21, 2026 05:12

yuanxion closed this May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] fix dGPU func testcases in smoke_ScaledAttnDynamic4D_GPU and smoke_MatMulCompressedWeights_extra_multiply#35343

[GPU] fix dGPU func testcases in smoke_ScaledAttnDynamic4D_GPU and smoke_MatMulCompressedWeights_extra_multiply#35343
yuanxion wants to merge 8 commits into
openvinotoolkit:masterfrom
yuanxion:fix-ci-dgpu-tests-gha-scale-atten2

yuanxion commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

yuanxion commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if (i == attn_mask_idx && desc->attn_mask_val.has_value())
	continue;

	} else if (!is_compressed_weight \|\| !supports_immad) {
	return std::make_tuple(false, std::move(shape_a_aligned), std::move(shape_b_aligned));
	}

Conversation

yuanxion commented Apr 15, 2026

Details

Description of the issue

Symptom

Root cause

How to fix it

The code and line that caused this issue

Reproduction step and snapshot

Problematic graph

Checklist

Tickets:

AI Assistance:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

yuanxion commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants