You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Onnx MatMul op is lowered down into QNN Conv2D op with 1x1 weights when LPBQ encodings are present.
Motivation and Context
HTP supports Conv2D with 1x1 weights for LPBQ encodings. The Matmul and FC op are either not fully supported or not performant on HTP. Hence, we are translating the Matmuls into Convolutions.
HandleUnsqueeze is a no-op for LPBQ encoding. The PR comment claims the LPBQ axis is remapped from 1 to 3 when unsqueezing [K,N] -> [1,1,K,N], but QnnQuantParamsWrapper::HandleUnsqueeze (in qnn_quant_params_wrapper.h:109-158) early-returns when !IsPerChannel(), and IsPerChannel() does not match the LPBQ BLOCKWISE_EXPANSION encoding. As a result, params_.blockwiseExpansion->axis is never rewritten — QNN sees the weight as 4D HWCN but with the LPBQ axis still pointing at H. This is a silent correctness bug.
Suggested fix: extend QnnQuantParamsWrapper::HandleUnsqueeze (and HandleTranspose) with an IsLPBQ() branch that runs the same axis-remap logic. Add a unit test that asserts axis == 3 after unsqueeze from [K,N] to [1,1,K,N].
The use_conv2d branch is effectively dead code on current main. QnnQuantParamsWrapper::Init(qnn_model_wrapper, io_def) (the path used by GetTensorInfo) cannot construct a BLOCKWISE_EXPANSION encoding from a QDQ structure — LPBQ qparams are only created via the explicit (per_channel_float_scales, per_block_int_scales, ..., axis, block_size, is_int4) constructor used inside fusion code. The BQ-to-LPBQ conversion that would let Init() emit LPBQ lives on dev/qti-ashimaj/bq2lpbq and has not been merged.
Suggested fix: disclose the dependency on the bq2lpbq work in the PR description, coordinate merge order, and add an integration test that proves the path is reachable. Once bq2lpbq lands, C-1's silent axis bug fires immediately.
No unit tests for the new path. 168 lines of new logic with zero coverage. Add at least three HTP-backend tests (rank 2/3/4 activation × LPBQ rank-2 weight), each (a) using QnnGraphChecker to assert the fused op is QNN_OP_CONV_2D, (b) comparing output to CPU EP, (c) including a kill-test fence. Use the existing LPBQ tests around lpbqgemm_fusion as a template.
Conv2D is missing QNN_OP_CONV_2D_PARAM_DILATION and QNN_OP_CONV_2D_PARAM_GROUP. The standard conv_op_builder.cc always sets both explicitly (see L835-857, L980-994). Relying on QNN defaults is fragile across SDK versions and op-validation modes.
The implicit-bias workaround for QNN SDK 2.23/2.24/2.25 (in conv_op_builder.cc:537-553) is not applied. Builds against those SDK versions will hit a validation bug because no bias is supplied. Extract the existing AddZeroBiasInput block into a shared helper and call it at the end of ProcessInputsForQnnConv2D.
UnpackInitializerData(input_info_1.initializer_tensor, ...) is called without a defensive RETURN_IF_NOT(input_info_1.is_initializer, ...). Today this is safe because CheckInputs gates use_conv2d on is_initializer, but the coupling is implicit and a future refactor could expose a NULL pointer dereference. Add an explicit guard at the top of the function.
ProcessInput0 now triggers reshape on is_rank1 || shape_mismatch || (use_fully_connected && shape.size() > 2). Add a doc comment listing the three independent triggers, and assert !(use_fully_connected && target_shape != nullptr) since the two are mutually exclusive.
RETURN_IF_NOT(input_info_1.shape.size() == 2, ...) is redundant with the CheckInputs gate. Keep it as defence-in-depth but add a one-line comment so readers don't think use_conv2d supports other ranks.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Onnx MatMul op is lowered down into QNN Conv2D op with 1x1 weights when LPBQ encodings are present.
Motivation and Context
HTP supports Conv2D with 1x1 weights for LPBQ encodings. The Matmul and FC op are either not fully supported or not performant on HTP. Hence, we are translating the Matmuls into Convolutions.
This PR depends on #307