[QNN EP] 2.3.0 RC2 Changes by qti-mbadnara · Pull Request #386 · onnxruntime/onnxruntime-qnn

qti-mbadnara · 2026-05-14T06:43:41Z

Cherry-picked the following from the mainline to rel-2.3.0 as part of ORT QNN EP 2.3.0 RC2

ORT QNN EP 2.3.0 RC1 Changes were added as part of this PR

[QNN EP] 2.3.0 RC1 Changes #385

* re-enable faled ut back

…ub.com/onnxruntime/onnxruntime-qnn into dev/qti-mbadnara/rel-2.3.0-rc2-changes

* [QNN EP] Fuse Dynamic MatMulInteger pattern into Float QNN MatMul Adds DQMatMulIntegerFusion, a new IQnnNodeGroup that recognizes the ONNX dynamic-quantization MatMul pattern emitted by tooling like onnxruntime quantization (QDQ for activations, integer weights) and folds it into a single QNN float MatMul, avoiding the int8/uint8 MatMulInteger op which QNN does not natively support. Pattern matched (ONNX, starting at MatMulInteger): x --> DynamicQuantizeLinear --> (a_q, a_scale, a_zp) a_q, B, a_zp, B_zp --> MatMulInteger --> Cast(FLOAT) a_scale, B_scale_init --> parallel Mul Cast.out, parallel_Mul.out--> requant Mul requant_Mul.out, bias_init--> Add (optional) Rewrite (QNN): x ---------------------------------+ | (input[0] of MatMul) v B --> [Dequantize(B_scale,B_zp)] --> MatMul --> [Add(bias)] --> out

Extend MatMulNBits op builder from GPU only to HTP. Restrict the support in 2bits/4bits and block size 32/64 for HTP. Unlike GPU using BlockEncoding, HTP requires using BwFloatBlockEncoding with bitwidth set and quint8 for tensor. Thus, extend QnnQuantParamsWrapper to accept setting encodings for QNN_QUANTIZATION_ENCODING_BW_FLOAT_BLOCK type. Per HTP request, MatMulNBits is transformed into Conv2d with necessary Reshape and Cast around due to HTP timeline not able to complete the implementation for MatMul in time. Test: UT while extending hardcoding utility functions for 2/4/8bit. TODO: Re-enable testcases once QAIRT is upleveled to v2.47.

* [QNN EP] Fix ReshapeGemmFusion rank-5 input regression QNN HTP FullyConnected rejects input tensors with rank > 4. PR #232 added ReshapeGemmFusionGroup which bypasses the input Reshape and passes the original (pre-reshape) tensor directly to QNN FC. This makes MatMul receives a rank-5 input and causes it fall back to CPU with error 3110 "incorrect Rank 5". Fix: add a rank guard in CheckShape so the fusion is skipped when the pre-reshape tensor has rank > 4. The standalone GemmOpBuilder then handles the Gemm with the already-flattened rank-2 input as before.

When beta=0.0, Y = alpha*(A@B) + 0*C simplifies to alpha*(A@B), so the bias input can be dropped entirely. Map these Gemm nodes to QNN FullyConnected without bias instead of falling back to CPU. Add CPU and HTP QDQ unit tests covering this case.

* [QNN EP] Fuse DynamicQuantizeLinear + ConvInteger pattern into float QNN Conv2d Introduces DQConvIntegerFusion, a new IQnnNodeGroup fusion that rewrites the dynamic-quantize ConvInteger subgraph into a floating-point QNN Conv2d node, as QNN HTP doesn't support Dynamic Quantization. x --> DynamicQuantizeLinear --> (a_q, a_scale, a_zp) a_q, B_int8, a_zp, B_zp_int8 --> ConvInteger --> Cast(FLOAT) a_scale, B_scale_init --> Mul Cast.out, parallel_Mul.out --> Mul requant_Mul.out, bias_init --> Add [optional] x --> Transpose(NCHW->NHWC) -------+ | (activation input) v B --> [Dequantize(B_scale,B_zp)] --> Conv2d --> Transpose(NHWC->NCHW) --> [Add(bias)] --> out * Per-channel B_scale: HTP Dequantize does not accept per-channel quant inputs, so the int8 weight is pre-dequantized to float32 offline and emitted as a STATIC float tensor without a Dequantize op. - Remove IsNpuBackend guard so DQConvIntegerFusion works on all QNN backends. - Drop unused #include "core/providers/qnn/builder/qnn_def.h". - Reject sibling absorption when any sibling ConvInteger is not structurally fusible (IsConvIntegerStructurallyFusible + DQL consumer walk in TryFusion). - Reject ConvInteger with non-static rank-4 output shape early in TryFusion to avoid claiming the DQL and failing later in CreateOrValidateOnQnn. - Replace silent return-on-missing-HTP-JSON with GTEST_SKIP in fusion tests. - Promote kFusionType to DQConvIntegerFusion::kType to remove the literal duplication. - Add depthwise coverage (per-tensor and per-channel B_scale) for the QNN_OP_DEPTH_WISE_CONV_2D path. - Add negative sibling-rejection test (one sibling has runtime B_zp; assert neither sibling fuses).

[QNN EP] Ensure ORT Core is 1.24.4 (#369)

238c109

qti-mbadnara force-pushed the dev/qti-mbadnara/rel-2.3.0-rc2-changes branch from b0b1a4c to def424d Compare May 14, 2026 18:10

qti-mbadnara changed the base branch from main to rel-2.3.0 May 14, 2026 18:11

qti-mbadnara force-pushed the dev/qti-mbadnara/rel-2.3.0-rc2-changes branch from 29b0cb3 to 238c109 Compare May 20, 2026 22:11

qti-mbadnara and others added 12 commits May 20, 2026 15:12

Merge branch 'rel-2.3.0' into dev/qti-mbadnara/rel-2.3.0-rc2-changes

0765a40

Re-enable failed UT for Linux aarch64 running on qcs6490 (#361)

7b864c0

* re-enable faled ut back

Merge branch 'dev/qti-mbadnara/rel-2.3.0-rc2-changes' of https://gith…

6bf0a2b

…ub.com/onnxruntime/onnxruntime-qnn into dev/qti-mbadnara/rel-2.3.0-rc2-changes

[QNN EP] Prepare artifacts for signing (#355)

32da198

[QNN EP] Uplevel QAIRT to 2.46.0 (#370)

0fa7711

[QNN EP] Minor CI Fixes (#395)

6b0a09c

[QNN EP] Add support to build ARM64 (ARM64x) Zip (#397)

9c2a8f5

qti-mbadnara closed this May 20, 2026

qti-mbadnara deleted the dev/qti-mbadnara/rel-2.3.0-rc2-changes branch May 20, 2026 22:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QNN EP] 2.3.0 RC2 Changes#386

[QNN EP] 2.3.0 RC2 Changes#386
qti-mbadnara wants to merge 13 commits into
rel-2.3.0from
dev/qti-mbadnara/rel-2.3.0-rc2-changes

qti-mbadnara commented May 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

qti-mbadnara commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

qti-mbadnara commented May 14, 2026 •

edited

Loading