Translate MatMul to Conv2D 1x1 for LPBQ encodings by qti-ashimaj · Pull Request #357 · onnxruntime/onnxruntime-qnn

qti-ashimaj · 2026-05-08T10:03:13Z

Description

Onnx MatMul op is lowered down into QNN Conv2D op with 1x1 weights when LPBQ encodings are present.

Motivation and Context

HTP supports Conv2D with 1x1 weights for LPBQ encodings. The Matmul and FC op are either not fully supported or not performant on HTP. Hence, we are translating the Matmuls into Convolutions.

This PR depends on #307

qti-kromero · 2026-05-20T20:18:56Z


 /**
- * An ONNX MatMul can be translated to either a QNN MatMul or a QNN FullyConnected.
+ * An ONNX MatMul can be translated to a QNN MatMul, a QNN FullyConnected, or a QNN Conv2D.


these new op behaviors/translations probably need corresponding op unit tests

qti-chuteng · 2026-05-21T15:53:41Z

Critical

[C-1] `onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:495-505`

HandleUnsqueeze is a no-op for LPBQ encoding. The PR comment claims the LPBQ axis is remapped from 1 to 3 when unsqueezing [K,N] -> [1,1,K,N], but QnnQuantParamsWrapper::HandleUnsqueeze (in qnn_quant_params_wrapper.h:109-158) early-returns when !IsPerChannel(), and IsPerChannel() does not match the LPBQ BLOCKWISE_EXPANSION encoding. As a result, params_.blockwiseExpansion->axis is never rewritten — QNN sees the weight as 4D HWCN but with the LPBQ axis still pointing at H. This is a silent correctness bug.

Suggested fix: extend QnnQuantParamsWrapper::HandleUnsqueeze (and HandleTranspose) with an IsLPBQ() branch that runs the same axis-remap logic. Add a unit test that asserts axis == 3 after unsqueeze from [K,N] to [1,1,K,N].

[C-2] `onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:96-100`

The use_conv2d branch is effectively dead code on current main. QnnQuantParamsWrapper::Init(qnn_model_wrapper, io_def) (the path used by GetTensorInfo) cannot construct a BLOCKWISE_EXPANSION encoding from a QDQ structure — LPBQ qparams are only created via the explicit (per_channel_float_scales, per_block_int_scales, ..., axis, block_size, is_int4) constructor used inside fusion code. The BQ-to-LPBQ conversion that would let Init() emit LPBQ lives on dev/qti-ashimaj/bq2lpbq and has not been merged.

Suggested fix: disclose the dependency on the bq2lpbq work in the PR description, coordinate merge order, and add an integration test that proves the path is reachable. Once bq2lpbq lands, C-1's silent axis bug fires immediately.

Major

[M-1] `onnxruntime/test/providers/qnn/matmul_test.cc`

No unit tests for the new path. 168 lines of new logic with zero coverage. Add at least three HTP-backend tests (rank 2/3/4 activation × LPBQ rank-2 weight), each (a) using QnnGraphChecker to assert the fused op is QNN_OP_CONV_2D, (b) comparing output to CPU EP, (c) including a kill-test fence. Use the existing LPBQ tests around lpbqgemm_fusion as a template.

[M-2] `onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:544-553`

Conv2D is missing QNN_OP_CONV_2D_PARAM_DILATION and QNN_OP_CONV_2D_PARAM_GROUP. The standard conv_op_builder.cc always sets both explicitly (see L835-857, L980-994). Relying on QNN defaults is fragile across SDK versions and op-validation modes.

QnnParamWrapper dilation_param(node_unit.Index(), node_unit.Name(),
                                QNN_OP_CONV_2D_PARAM_DILATION, {2}, {1, 1});
param_tensor_names.push_back(dilation_param.GetParamTensorName());
qnn_model_wrapper.AddParamWrapper(std::move(dilation_param));

Qnn_Scalar_t group_scalar = QNN_SCALAR_INIT;
group_scalar.dataType = QNN_DATATYPE_UINT_32;
group_scalar.uint32Value = 1;
QnnParamWrapper group_param(node_unit.Index(), node_unit.Name(),
                             QNN_OP_CONV_2D_PARAM_GROUP, group_scalar);
param_tensor_names.push_back(group_param.GetParamTensorName());
qnn_model_wrapper.AddParamWrapper(std::move(group_param));

[M-3] `onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:464-523`

The implicit-bias workaround for QNN SDK 2.23/2.24/2.25 (in conv_op_builder.cc:537-553) is not applied. Builds against those SDK versions will hit a validation bug because no bias is supplied. Extract the existing AddZeroBiasInput block into a shared helper and call it at the end of ProcessInputsForQnnConv2D.

[M-4] `onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:509-511`

UnpackInitializerData(input_info_1.initializer_tensor, ...) is called without a defensive RETURN_IF_NOT(input_info_1.is_initializer, ...). Today this is safe because CheckInputs gates use_conv2d on is_initializer, but the coupling is implicit and a future refactor could expose a NULL pointer dereference. Add an explicit guard at the top of the function.

Minor

[N-1] `onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:137-156`

ProcessInput0 now triggers reshape on is_rank1 || shape_mismatch || (use_fully_connected && shape.size() > 2). Add a doc comment listing the three independent triggers, and assert !(use_fully_connected && target_shape != nullptr) since the two are mutually exclusive.

[N-2] `onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:499`

RETURN_IF_NOT(input_info_1.shape.size() == 2, ...) is redundant with the CheckInputs gate. Keep it as defence-in-depth but add a one-line comment so readers don't think use_conv2d supports other ranks.

[N-3] `onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:464-470`

logger is taken by reference but only forwarded to ProcessInput0. Add at least one VERBOSE log when use_conv2d activates, e.g.:

ORT_CXX_LOG(logger, ORT_LOGGING_LEVEL_VERBOSE,
            ("Lowering MatMul " + node_unit.Name() + " to QNN Conv2D 1x1 (LPBQ path)").c_str());

quic-calvnguy self-requested a review May 11, 2026 19:58

qti-ashimaj force-pushed the dev/qti-ashimaj/matmul2conv branch 2 times, most recently from c051257 to 231385f Compare May 13, 2026 14:41

qti-ashimaj marked this pull request as ready for review May 13, 2026 14:44

qti-ashimaj requested review from qti-ashwshan, qti-chuteng, qti-jkilpatrick, qti-kromero, qti-yuduo, tirupath-qti and yath1 as code owners May 13, 2026 14:44

qti-kromero reviewed May 20, 2026

View reviewed changes

qti-ashimaj added 5 commits June 17, 2026 18:05

translate MatMul to Conv2D 1x1 for LPBQ

ff3a92e

minor fix

1d4c010

fix lint

5ba9515

add TCs for block quantized matmul op

60af44b

fix lint

f14eb2f

qti-ashimaj force-pushed the dev/qti-ashimaj/matmul2conv branch from 1171b1b to f14eb2f Compare June 18, 2026 04:38

qti-ashimaj added 2 commits June 18, 2026 11:22

add flag

9d4f8e0

remove fallback TC

d3fda2c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translate MatMul to Conv2D 1x1 for LPBQ encodings#357

Translate MatMul to Conv2D 1x1 for LPBQ encodings#357
qti-ashimaj wants to merge 7 commits into
mainfrom
dev/qti-ashimaj/matmul2conv

qti-ashimaj commented May 8, 2026 •

edited

Loading

Uh oh!

qti-kromero May 20, 2026

Uh oh!

qti-chuteng commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qti-ashimaj commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

qti-kromero May 20, 2026

Choose a reason for hiding this comment

Uh oh!

qti-chuteng commented May 21, 2026

Critical

[C-1] onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:495-505

[C-2] onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:96-100

Major

[M-1] onnxruntime/test/providers/qnn/matmul_test.cc

[M-2] onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:544-553

[M-3] onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:464-523

[M-4] onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:509-511

Minor

[N-1] onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:137-156

[N-2] onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:499

[N-3] onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:464-470

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qti-ashimaj commented May 8, 2026 •

edited

Loading

[C-1] `onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:495-505`

[C-2] `onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:96-100`

[M-1] `onnxruntime/test/providers/qnn/matmul_test.cc`

[M-2] `onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:544-553`

[M-3] `onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:464-523`

[M-4] `onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:509-511`

[N-1] `onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:137-156`

[N-2] `onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:499`

[N-3] `onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc:464-470`