[NPUW][MoE Scaling]Support MoE expert layout variants and add FoldShapeComputeChain pass by intelgaoxiong · Pull Request #36184 · openvinotoolkit/openvino

intelgaoxiong · 2026-06-02T08:07:14Z

Details:

MoE models exported with different opset versions produce expert output tensors with the singleton dimension in different positions:

Layout A: [num_experts, 1, token_num, hidden]
Layout B: [num_experts, token_num, 1, hidden]

Previously only Layout A was handled. This PR makes NPUW MoE inference layout-agnostic and adds a constant-folding pass to unblock pattern matching on static-shape graphs.

Validation job: https://cje-ir-prod01.devtools.intel.com/sai-npu-experience/job/Staging/job/ding/job/Validate/29/Validation_20report/

Changes

New: `FoldShapeComputeChain` pass (`fold_const.hpp/cpp`)

Four MatcherPass classes (FoldShapeOf, FoldGatherOfConst, FoldUnsqueezeOfConst, FoldConcatOfConsts) plus a ModelPass wrapper that runs the full pipeline in one call, which makes partitioning easier.

`moe.cpp`

GPTOSSRouter: removes ShapeOf/topk_convert from the formal pattern (both are resolved before matching) and uses any_input() for all Slice shape inputs.
GPTOSSExpert: decoding/prefill detection scans middle dims instead of assuming a fixed rank-2 token dimension, accepting both layout variants.

`moe_transformation.cpp`

Replaces update_reshape_constant_dimension (fixed negative index) with update_reshape_dimensions (range-based scan over middle dims), correctly handling both 3-D and 4-D reshape patterns for both layout variants.

`moe_infer_utils.cpp` / `moe_resources.cpp`

Extracts get_router_token_count(router_shape) helper to unify the two-layout token-dim detection in parse_selected_experts_from_router and gather_router_scores.
Accumulator buffer shape derivation now explicitly excludes the last dimension (hidden dim) from chunk-size substitution, preventing silent shape corruption when chunk_size == embed_dim.

Tests

fold_const_test.cpp: three GTest cases verify FoldShapeComputeChain on a graph mirroring the actual router subgraph.

Tickets:

AI Assistance:

AI assistance used: no / yes
If yes, summarize how AI was used and what human validation was performed (build/tests/manual checks).

…ze to constant. Support Qwen/GPT-OSS MoE layout differences throughout inference pipeline GPT-OSS and Qwen use different 4-D tensor layouts for MoE expert output: GPT-OSS: [N, 1, T, H] (singleton at dim 1) Qwen: [N, T, 1, H] (singleton at dim 2) Both have identical flat memory strides; only shape metadata differs. Changes: - moe_transformation.cpp: fix_token_count_for_expert_iterative now scans middle dims (1..n-2) by value instead of hardcoding second-to-last index, so both layouts are correctly patched for chunked prefill. - moe_transformation.cpp: detect_and_transform_moe_downstream accepts both [N,1,H,W] and [N,H,1,W] parameter shapes for the ReduceSum pattern. - moe_infer_utils.cpp: parse_selected_experts_from_router and gather_router_scores detect layout by checking which dim equals 1. - moe_resources.cpp: expert_output_accumulator shape is derived from the compiled model output shape template instead of hardcoded [K,1,T,H]. Solved layout issue for GPT-OSS. Fixed link error. Runs the full shape-compute-chain folding pipeline in a single pass. Add FoldConstTest. Refine code. Signed-off-by: intelgaoxiong <xiong.gao@intel.com>

github-actions Bot added category: build OpenVINO cmake script / infra category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Jun 2, 2026

intelgaoxiong force-pushed the xiong/moe_new branch from a7baa19 to 919ab95 Compare June 3, 2026 01:47

intelgaoxiong force-pushed the xiong/moe_new branch from dc2cc23 to 65f67e9 Compare June 3, 2026 05:20

intelgaoxiong marked this pull request as ready for review June 3, 2026 05:22

intelgaoxiong requested review from a team as code owners June 3, 2026 05:22

intelgaoxiong changed the title ~~Xiong/moe new~~ [NPUW]Support MoE expert layout variants and add FoldShapeComputeChain pass Jun 3, 2026

intelgaoxiong changed the title ~~[NPUW]Support MoE expert layout variants and add FoldShapeComputeChain pass~~ [NPUW][MoE Scaling]Support MoE expert layout variants and add FoldShapeComputeChain pass Jun 3, 2026

intelgaoxiong requested review from dmatveev and dylanneve1 June 3, 2026 11:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NPUW][MoE Scaling]Support MoE expert layout variants and add FoldShapeComputeChain pass#36184

[NPUW][MoE Scaling]Support MoE expert layout variants and add FoldShapeComputeChain pass#36184
intelgaoxiong wants to merge 1 commit into
openvinotoolkit:masterfrom
intelgaoxiong:xiong/moe_new

intelgaoxiong commented Jun 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

intelgaoxiong commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details:

Changes

New: FoldShapeComputeChain pass (fold_const.hpp/cpp)

moe.cpp

moe_transformation.cpp

moe_infer_utils.cpp / moe_resources.cpp

Tests

Tickets:

AI Assistance:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

intelgaoxiong commented Jun 2, 2026 •

edited

Loading

New: `FoldShapeComputeChain` pass (`fold_const.hpp/cpp`)

`moe.cpp`

`moe_transformation.cpp`

`moe_infer_utils.cpp` / `moe_resources.cpp`