[qwen] Add Qwen3.5-MoE builder (port of onnxruntime-genai #2146) by Copilot · Pull Request #349 · xadupre/mbext

Copilot · 2026-05-24T12:21:08Z

Ports the Qwen3.5-MoE (Qwen3_5MoeForConditionalGeneration) builder from microsoft/onnxruntime-genai#2146 and adds fast unit tests.

Changes

modelbuilder/builders/qwen.py
- New Qwen35MoeTextModel(Qwen35TextModel): bias-free top-k router, packed routed experts with HF [gate|up] → ORT interleaved repack for swiglu_fusion=1, SiLU shared expert with sigmoid gating, supports both MoE and QMoE (symmetric blockwise, no zero_points). Strips stale /mlp/ entries from the int4 customized_weight_config.
- Moved self.model_type assignment from make_genai_config into Qwen35TextModel.__init__ so subclasses can override it. Qwen35MoeTextModel sets it to "Qwen3_5_MoeForConditionalGeneration" so the base-class snake-case strip yields qwen3_5_moe (matching the C++ key registered upstream).
modelbuilder/builder.py — dispatch Qwen3_5MoeForConditionalGeneration → Qwen35MoeTextModel.
tests/fast/test_random_qwen3_5_moe.py
- test_qwen3_5_moe_fp32_cpu_full_attention_build — builds a tiny random-weight MoE (4 experts, 2 full_attention layers, hidden=128, moe_intermediate=64); asserts an MoE/QMoE node is emitted and genai_config.json reports model.type == "qwen3_5_moe".
- test_qwen3_5_moe_fp32_cpu_discrepancy_full_attention — runs ONNX Runtime CPU prefill and the HF PyTorch forward on the same inputs_embeds + 3-D mRoPE position_ids, then asserts np.testing.assert_allclose(pt_logits, ort_logits, atol=1e-3, rtol=1e-3) and that the greedy first-token argmax of the last-row logits agrees between PyTorch and ONNX Runtime. Observed on the tiny random-weight model: avg |PT−ORT| ≈ 8×10⁻⁸, max ≈ 6×10⁻⁷, first-token argmax matches.

Notes / deviations from upstream PR

The C++ model_type.h / model.cpp changes are not ported — this repo is Python-only.
Upstream also retitles the text-only Qwen3.5 model type to Qwen3_5_textForCausalLM; left unchanged here since mbext already has a dedicated Qwen35CausalLMModel subclass and the existing genai-config string is preserved.

…t test Agent-Logs-Url: https://github.com/xadupre/mbext/sessions/0d946b1a-b3d9-4d9a-bba8-57a66a71d021 Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>

xadupre · 2026-05-24T12:34:19Z

+
+    @requires_transformers("5")
+    @hide_stdout()
+    def test_qwen3_5_moe_fp32_cpu_full_attention_build(self):


@copilot you need to check for discrepancies with onnxruntime and first token difference with onnxruntime-genai

Added in 377b8b6: new test_qwen3_5_moe_fp32_cpu_discrepancy_full_attention runs ORT CPU prefill and HF PyTorch forward on the same inputs_embeds + 3-D mRoPE position_ids, then asserts np.testing.assert_allclose(pt_logits, ort_logits, atol=1e-3, rtol=1e-3) and that the greedy first-token argmax of the last-row logits agrees between PT and ORT. Measured on the tiny random-weight model: avg |PT−ORT| ≈ 8e-8, max ≈ 6e-7, first-token argmax matches.

Agent-Logs-Url: https://github.com/xadupre/mbext/sessions/b964cd1f-c911-4ab1-b273-972fb0386e8f Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>

github-actions · 2026-05-24T12:51:41Z

📂 Impacted Files

3 file(s) changed in this pull request:

🔵 modelbuilder/builder.py (+4 / -0)
🔵 modelbuilder/builders/qwen.py (+209 / -4)
🟢 tests/fast/test_random_qwen3_5_moe.py (+271 / -0)

codecov · 2026-05-24T12:54:56Z

Codecov Report

❌ Patch coverage is 83.50515% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.96%. Comparing base (b586dd8) to head (377b8b6).

Files with missing lines	Patch %	Lines
modelbuilder/builders/qwen.py	82.97%	16 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #349      +/-   ##
==========================================
+ Coverage   79.92%   79.96%   +0.03%     
==========================================
  Files          25       25              
  Lines        7741     7835      +94     
==========================================
+ Hits         6187     6265      +78     
- Misses       1554     1570      +16

Flag	Coverage Δ
fast-tests	`79.96% <83.50%> (+0.03%)`	⬆️
fast-tests-ort-nightly	`79.91% <83.50%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Initial plan

85efdb8

Copilot AI assigned Copilot and xadupre May 24, 2026

Copilot started work on behalf of xadupre May 24, 2026 12:21 View session

Copilot AI linked an issue May 24, 2026 that may be closed by this pull request

import changes from https://github.com/microsoft/onnxruntime-genai/pull/2146, add fast unit test #348

Closed

Import Qwen3.5-MoE support from onnxruntime-genai PR #2146 + fast uni…

a24a424

…t test Agent-Logs-Url: https://github.com/xadupre/mbext/sessions/0d946b1a-b3d9-4d9a-bba8-57a66a71d021 Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Import changes from PR 2146 and add fast unit test~~ Add Qwen3.5-MoE builder (port of onnxruntime-genai #2146) May 24, 2026

Copilot finished work on behalf of xadupre May 24, 2026 12:32

Copilot AI requested a review from xadupre May 24, 2026 12:32

xadupre reviewed May 24, 2026

View reviewed changes

Copilot started work on behalf of xadupre May 24, 2026 12:34 View session

Add ORT-vs-PT discrepancy + first-token check for Qwen3.5-MoE

377b8b6

Agent-Logs-Url: https://github.com/xadupre/mbext/sessions/b964cd1f-c911-4ab1-b273-972fb0386e8f Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>

Copilot finished work on behalf of xadupre May 24, 2026 12:43

Copilot AI requested a review from xadupre May 24, 2026 12:43

xadupre approved these changes May 24, 2026

View reviewed changes

xadupre marked this pull request as ready for review May 24, 2026 12:51

github-actions Bot changed the title ~~Add Qwen3.5-MoE builder (port of onnxruntime-genai #2146)~~ [qwen] Add Qwen3.5-MoE builder (port of onnxruntime-genai #2146) May 24, 2026

xadupre merged commit 97d302d into main May 24, 2026
9 checks passed

xadupre deleted the copilot/import-changes-and-add-fast-unit-test branch May 24, 2026 12:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[qwen] Add Qwen3.5-MoE builder (port of onnxruntime-genai #2146)#349

[qwen] Add Qwen3.5-MoE builder (port of onnxruntime-genai #2146)#349
xadupre merged 3 commits into
mainfrom
copilot/import-changes-and-add-fast-unit-test

Copilot AI commented May 24, 2026 •

edited

Loading

Uh oh!

xadupre May 24, 2026

Uh oh!

Copilot AI May 24, 2026

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

codecov Bot commented May 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Notes / deviations from upstream PR

Uh oh!

xadupre May 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 24, 2026

📂 Impacted Files

Uh oh!

codecov Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented May 24, 2026 •

edited

Loading

codecov Bot commented May 24, 2026 •

edited

Loading