[QNN EP] Make graph partitioner aware of multi-NodeUnit fusions by qti-ashwshan · Pull Request #464 · onnxruntime/onnxruntime-qnn

qti-ashwshan · 2026-05-28T23:55:11Z

Description

Make the QNN EP graph partitioner aware of multi-NodeUnit fusions (IQnnNodeGroup).

The partitioner today is OrtNodeUnit-aware but blind to IQnnNodeGroup. Its BFS schedules per-NodeUnit, so members of a multi-NodeUnit fusion (e.g. LPBQ MatMul/Gemm, LayerNorm, Gelu, ChannelShuffle, SpaceToDepth, ReshapeGemm) can be scheduled into different partitions when an unsupported op falls between them. At Compile time, QnnModel::ComposeGraph re-runs GetQnnNodeGroups on each partition subgraph; with members missing from the subgraph, TryFusion returns nullptr and the fusion silently drops. Depending on the fusion this either degrades to a slower per-op path or aborts compile entirely (e.g. when the per-op path can't handle the encoding the fusion was producing).

This PR rewrites the partitioner to schedule on IQnnNodeGroup as the atomic unit. Every member of a multi-NodeUnit fusion is guaranteed to land in the same partition, or none.

Approach

Add QnnNodeGroupInfo describing each IQnnNodeGroup (members, target, supported flag, external in-degree). Every OrtNodeUnit ends up in exactly one group — real fusion or 1-member QnnNodeUnitWrapper — so BFS becomes branch-free.
Compute external_in_degree per group via a per-edge producer walk that excludes intra-group edges and initializers.
Replace the NodeUnit-level BFS in CreateSupportedPartitionNodeGroups with a group-level BFS: when admitted, all member OrtNodes are emitted atomically into the current partition.
Add a cycle-detection + demotion loop in GetSupportedNodes for the rare case where a multi-NodeUnit group cannot be scheduled atomically without forming a cyclic dependency through an unsupported op. Such groups get demoted to 1-member groups and re-checked via the standalone op builder, then either run independently on QNN or fall back to CPU.
Files

qnn_ep_utils.h — QnnNodeGroupInfo struct, ComputeGroupExternalInDegree, extended CreateSupportedPartitionNodeGroups signature.
qnn_ep_utils.cc — group-level BFS, edge-walker helper.
qnn_execution_provider.h/.cc — extended private GetSupportedNodes to build the group set, cycle detection + demotion loop, updated single call site in GetCapabilityImpl.
Compatibility

No public API change. GetSupportedNodes and CreateSupportedPartitionNodeGroups are internal with single in-tree callers.
No IQnnNodeGroup interface change. Authors of new fusions get atomic partitioning automatically — no extra plumbing.
No op builder change. All existing IsOpSupported / AddToModelBuilder paths are untouched.
GetCapabilityImpl ABI to ORT core unchanged. Only the function body.
EPContext path is unaffected (doesn't go through the partitioner).

Motivation and Context

Problem

A model can have a perfectly valid multi-NodeUnit fusion pattern that QNN can compile to a single optimized op, but a single unsupported op anywhere upstream of the fusion's target node is enough for BFS to split the fusion's members across partitions. The fusion then silently fails to fire at Compile time. Symptoms vary:

For fusions with a usable per-op fallback (Gelu, LayerNorm with constant gamma, ChannelShuffle, etc.): performance regression. Output is correct but the fused QNN op never gets built.
For fusions where the per-op fallback can't handle the encoding produced by the fusion (LPBQ MatMul/Gemm with per-channel block-quantized weights, etc.): outright Compile failure. The standalone DQ/Q/MatMul builders reject the per-channel block-quantized weight tensor that only the LPBQ fusion knows how to consume.
This is an entire class of bug, not a single-fusion or single-model issue. Any future multi-NodeUnit fusion is silently exposed to the same hazard until this fix lands.

minfhong-qti · 2026-05-29T02:00:03Z

Do you have an example that would fail in current partition? Since this change extremely complicates the partition logic, I would prefer to seek for alternative solutions first before modifying it. Thanks!

…v/ashwshan/node-group-aware-partitioner

qti-ashwshan · 2026-06-19T20:32:10Z

Do you have an example that would fail in current partition? Since this change extremely complicates the partition logic, I would prefer to seek for alternative solutions first before modifying it. Thanks!

This will help - microsoft/onnxruntime#26325

qti-ashwshan added 2 commits April 28, 2026 09:42

[WIP] Node Group Aware Graph Partitioner

b308788

- minor fixes

616cc46

qti-ashwshan self-assigned this Jun 1, 2026

Merge branch 'main' of github.com:onnxruntime/onnxruntime-qnn into de…

ffe5a7d

…v/ashwshan/node-group-aware-partitioner

qti-ashwshan marked this pull request as ready for review June 17, 2026 00:52

qti-ashwshan requested review from qti-chuteng, qti-jkilpatrick, qti-kromero, qti-yuduo, tirupath-qti and yath1 as code owners June 17, 2026 00:52

qti-ashwshan closed this Jun 17, 2026

qti-ashwshan reopened this Jun 18, 2026

qti-ashwshan added 3 commits June 17, 2026 21:47

- minor fixes

b00ff57

Merge branch 'main' into dev/ashwshan/node-group-aware-partitioner

bc07429

Merge branch 'main' into dev/ashwshan/node-group-aware-partitioner

0244f2e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QNN EP] Make graph partitioner aware of multi-NodeUnit fusions#464

[QNN EP] Make graph partitioner aware of multi-NodeUnit fusions#464
qti-ashwshan wants to merge 6 commits into
mainfrom
dev/ashwshan/node-group-aware-partitioner

qti-ashwshan commented May 28, 2026

Uh oh!

minfhong-qti commented May 29, 2026

Uh oh!

qti-ashwshan commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

qti-ashwshan commented May 28, 2026

Description

Motivation and Context

Uh oh!

minfhong-qti commented May 29, 2026

Uh oh!

qti-ashwshan commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants