Skip to content

Translate MatMul to Conv2D 1x1 for LPBQ encodings#357

Open
qti-ashimaj wants to merge 7 commits into
mainfrom
dev/qti-ashimaj/matmul2conv
Open

Translate MatMul to Conv2D 1x1 for LPBQ encodings#357
qti-ashimaj wants to merge 7 commits into
mainfrom
dev/qti-ashimaj/matmul2conv

Conversation

@qti-ashimaj

@qti-ashimaj qti-ashimaj commented May 8, 2026

Copy link
Copy Markdown
Collaborator

Description

Onnx MatMul op is lowered down into QNN Conv2D op with 1x1 weights when LPBQ encodings are present.

Motivation and Context

HTP supports Conv2D with 1x1 weights for LPBQ encodings. The Matmul and FC op are either not fully supported or not performant on HTP. Hence, we are translating the Matmuls into Convolutions.

This PR depends on #307

@quic-calvnguy quic-calvnguy self-requested a review May 11, 2026 19:58
@qti-ashimaj qti-ashimaj force-pushed the dev/qti-ashimaj/matmul2conv branch 2 times, most recently from c051257 to 231385f Compare May 13, 2026 14:41
@qti-ashimaj qti-ashimaj marked this pull request as ready for review May 13, 2026 14:44

/**
* An ONNX MatMul can be translated to either a QNN MatMul or a QNN FullyConnected.
* An ONNX MatMul can be translated to a QNN MatMul, a QNN FullyConnected, or a QNN Conv2D.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these new op behaviors/translations probably need corresponding op unit tests

@qti-chuteng

Copy link
Copy Markdown
Collaborator

Critical

[C-1] onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:495-505

HandleUnsqueeze is a no-op for LPBQ encoding. The PR comment claims the LPBQ axis is remapped from 1 to 3 when unsqueezing [K,N] -> [1,1,K,N], but QnnQuantParamsWrapper::HandleUnsqueeze (in qnn_quant_params_wrapper.h:109-158) early-returns when !IsPerChannel(), and IsPerChannel() does not match the LPBQ BLOCKWISE_EXPANSION encoding. As a result, params_.blockwiseExpansion->axis is never rewritten — QNN sees the weight as 4D HWCN but with the LPBQ axis still pointing at H. This is a silent correctness bug.

Suggested fix: extend QnnQuantParamsWrapper::HandleUnsqueeze (and HandleTranspose) with an IsLPBQ() branch that runs the same axis-remap logic. Add a unit test that asserts axis == 3 after unsqueeze from [K,N] to [1,1,K,N].

[C-2] onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:96-100

The use_conv2d branch is effectively dead code on current main. QnnQuantParamsWrapper::Init(qnn_model_wrapper, io_def) (the path used by GetTensorInfo) cannot construct a BLOCKWISE_EXPANSION encoding from a QDQ structure — LPBQ qparams are only created via the explicit (per_channel_float_scales, per_block_int_scales, ..., axis, block_size, is_int4) constructor used inside fusion code. The BQ-to-LPBQ conversion that would let Init() emit LPBQ lives on dev/qti-ashimaj/bq2lpbq and has not been merged.

Suggested fix: disclose the dependency on the bq2lpbq work in the PR description, coordinate merge order, and add an integration test that proves the path is reachable. Once bq2lpbq lands, C-1's silent axis bug fires immediately.


Major

[M-1] onnxruntime/test/providers/qnn/matmul_test.cc

No unit tests for the new path. 168 lines of new logic with zero coverage. Add at least three HTP-backend tests (rank 2/3/4 activation × LPBQ rank-2 weight), each (a) using QnnGraphChecker to assert the fused op is QNN_OP_CONV_2D, (b) comparing output to CPU EP, (c) including a kill-test fence. Use the existing LPBQ tests around lpbqgemm_fusion as a template.

[M-2] onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:544-553

Conv2D is missing QNN_OP_CONV_2D_PARAM_DILATION and QNN_OP_CONV_2D_PARAM_GROUP. The standard conv_op_builder.cc always sets both explicitly (see L835-857, L980-994). Relying on QNN defaults is fragile across SDK versions and op-validation modes.

QnnParamWrapper dilation_param(node_unit.Index(), node_unit.Name(),
                                QNN_OP_CONV_2D_PARAM_DILATION, {2}, {1, 1});
param_tensor_names.push_back(dilation_param.GetParamTensorName());
qnn_model_wrapper.AddParamWrapper(std::move(dilation_param));

Qnn_Scalar_t group_scalar = QNN_SCALAR_INIT;
group_scalar.dataType = QNN_DATATYPE_UINT_32;
group_scalar.uint32Value = 1;
QnnParamWrapper group_param(node_unit.Index(), node_unit.Name(),
                             QNN_OP_CONV_2D_PARAM_GROUP, group_scalar);
param_tensor_names.push_back(group_param.GetParamTensorName());
qnn_model_wrapper.AddParamWrapper(std::move(group_param));

[M-3] onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:464-523

The implicit-bias workaround for QNN SDK 2.23/2.24/2.25 (in conv_op_builder.cc:537-553) is not applied. Builds against those SDK versions will hit a validation bug because no bias is supplied. Extract the existing AddZeroBiasInput block into a shared helper and call it at the end of ProcessInputsForQnnConv2D.

[M-4] onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:509-511

UnpackInitializerData(input_info_1.initializer_tensor, ...) is called without a defensive RETURN_IF_NOT(input_info_1.is_initializer, ...). Today this is safe because CheckInputs gates use_conv2d on is_initializer, but the coupling is implicit and a future refactor could expose a NULL pointer dereference. Add an explicit guard at the top of the function.


Minor

[N-1] onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:137-156

ProcessInput0 now triggers reshape on is_rank1 || shape_mismatch || (use_fully_connected && shape.size() > 2). Add a doc comment listing the three independent triggers, and assert !(use_fully_connected && target_shape != nullptr) since the two are mutually exclusive.

[N-2] onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:499

RETURN_IF_NOT(input_info_1.shape.size() == 2, ...) is redundant with the CheckInputs gate. Keep it as defence-in-depth but add a one-line comment so readers don't think use_conv2d supports other ranks.

[N-3] onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:464-470

logger is taken by reference but only forwarded to ProcessInput0. Add at least one VERBOSE log when use_conv2d activates, e.g.:

ORT_CXX_LOG(logger, ORT_LOGGING_LEVEL_VERBOSE,
            ("Lowering MatMul " + node_unit.Name() + " to QNN Conv2D 1x1 (LPBQ path)").c_str());

@qti-ashimaj qti-ashimaj force-pushed the dev/qti-ashimaj/matmul2conv branch from 1171b1b to f14eb2f Compare June 18, 2026 04:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants