Add Qwen3.5-35B-A3B MoE VLM ONNX export recipe by tanzeel-amd · Pull Request #405 · microsoft/olive-recipes

tanzeel-amd · 2026-05-08T10:34:50Z

Olive recipe for exporting Qwen/Qwen3.5-35B-A3B (256 experts, 8 routed + 1 shared)
Three sub-model pipeline: text decoder (INT4 QMoE), embedding (FP32), vision (FP32)
Custom ONNX-export-friendly MoE model class (codes/modeling_qwen3_5_moe.py)
Inference script with text, image, interactive, and benchmark modes
Requires ORT GenAI built with qwen3_5_moe support

Copilot

Pull request overview

Adds a new Olive recipe to export and run Qwen/Qwen3.5-35B-A3B as a three-submodel ONNX Runtime GenAI pipeline (vision encoder + embedding fusion + INT4 text decoder), including a custom ONNX-export-friendly MoE model shell and an inference/benchmark script.

Changes:

Introduces a custom Qwen3_5MoeModel implementation used for ONNX export of the vision and embedding submodels.
Adds Olive JSON pipelines for exporting/optimizing vision.onnx, embedding.onnx, and building text.onnx via ModelBuilder (INT4).
Adds end-to-end optimize.py config generation and inference.py runner with interactive + benchmark + optional PyTorch comparison.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`Qwen-Qwen3.5-35B-A3B/LICENSE`	Adds upstream Apache-2.0 license text for the recipe content.
`Qwen-Qwen3.5-35B-A3B/builtin/user_script.py`	Provides Olive model loaders + dummy inputs for exporting embedding/vision via a custom model shell.
`Qwen-Qwen3.5-35B-A3B/builtin/optimize.py`	Orchestrates Olive runs and patches `genai_config.json` + writes `processor_config.json` + tokenizer fixups.
`Qwen-Qwen3.5-35B-A3B/builtin/inference.py`	Adds ORT GenAI inference script with interactive mode and benchmarking (optionally vs PyTorch).
`Qwen-Qwen3.5-35B-A3B/builtin/cpu_and_mobile/text.json`	Olive pipeline to build INT4 text decoder via ModelBuilder.
`Qwen-Qwen3.5-35B-A3B/builtin/cpu_and_mobile/embedding.json`	Olive pipeline to export embedding fusion model and apply graph surgeries/optimizations.
`Qwen-Qwen3.5-35B-A3B/builtin/cpu_and_mobile/vision.json`	Olive pipeline to export vision encoder, apply PackedAttention surgery, and optimization passes.
`Qwen-Qwen3.5-35B-A3B/builtin/codes/modeling_qwen3_5_moe.py`	Custom ONNX-export-friendly model implementation (vision + embedding shell + MoE text components).
`Qwen-Qwen3.5-35B-A3B/builtin/codes/__init__.py`	Initializes the `codes` module for imports.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tanzeel-amd · 2026-05-08T10:43:52Z

@microsoft-github-policy-service agree company="AMD"

VishalX · 2026-05-11T10:06:50Z

@xieofxie / @devang-ml pls review

xieofxie · 2026-05-12T01:43:32Z

please wait for microsoft/onnxruntime-genai#2146

VishalX · 2026-05-24T05:59:47Z

please wait for microsoft/onnxruntime-genai#2146

OGA PR merged.

devang-ml · 2026-05-26T15:00:12Z

Please add README and info.yml

- Olive recipe for exporting Qwen/Qwen3.5-35B-A3B (256 experts, 8 routed + 1 shared) - Three sub-model pipeline: text decoder (INT4 QMoE), embedding (FP32), vision (FP32) - Custom ONNX-export-friendly MoE model class (codes/modeling_qwen3_5_moe.py) - Inference script with text, image, interactive, and benchmark modes - Requires ORT GenAI built with qwen3_5_moe support (see DEBUG_STATUS.md)

…ript.py - optimize.py: Read token IDs, vision params, and preprocessor settings from HuggingFace model/generation configs instead of hardcoding them. Model name is read from the Olive text.json config. Added --context-length CLI arg. - user_script.py: Use safetensors safe_open with prefix filtering to load only vision/embedding weights (~4 GB) instead of the full 35B checkpoint (~67 GB). Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

tanzeel-amd · 2026-05-26T18:22:35Z

@devang-ml Added README.md and info.yml. Please review.

Copilot AI review requested due to automatic review settings May 8, 2026 10:34

Copilot started reviewing on behalf of tanzeel-amd May 8, 2026 10:35 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

Comment thread Qwen-Qwen3.5-35B-A3B/builtin/user_script.py Outdated

Comment thread Qwen-Qwen3.5-35B-A3B/builtin/optimize.py

Comment thread Qwen-Qwen3.5-35B-A3B/builtin/codes/modeling_qwen3_5_moe.py Outdated

VishalX mentioned this pull request May 8, 2026

Add Qwen3.5-MoE (35B-A3B) model support microsoft/onnxruntime-genai#2146

Merged

tanzeel-amd force-pushed the turrahma/qwen3.5-moe-35B-A3B branch 3 times, most recently from 883738b to 274a85d Compare May 22, 2026 11:18

xieofxie previously approved these changes May 25, 2026

View reviewed changes

Ur Rahman and others added 10 commits May 26, 2026 11:16

Add LICENSE

4e48126

Update Licence

4cccc6d

Update Licence

ce32fce

Update Licence

a5960f9

Update Licence

e5b8c3c

Add MIT license for new files

e80041d

Fix SPDX license identifier to Apache-2.0 to match upstream header

5c48ab4

Co-authored-by: Cursor <cursoragent@cursor.com>

Add README.md and info.yml for Qwen3.5-35B-A3B recipe

03b5b4b

Co-authored-by: Cursor <cursoragent@cursor.com>

tanzeel-amd dismissed xieofxie’s stale review via 03b5b4b May 26, 2026 18:16

tanzeel-amd force-pushed the turrahma/qwen3.5-moe-35B-A3B branch from 274a85d to 03b5b4b Compare May 26, 2026 18:16

tanzeel-amd requested a review from xieofxie May 26, 2026 18:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3.5-35B-A3B MoE VLM ONNX export recipe#405

Add Qwen3.5-35B-A3B MoE VLM ONNX export recipe#405
tanzeel-amd wants to merge 10 commits into
microsoft:mainfrom
tanzeel-amd:turrahma/qwen3.5-moe-35B-A3B

tanzeel-amd commented May 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tanzeel-amd commented May 8, 2026

Uh oh!

VishalX commented May 11, 2026

Uh oh!

xieofxie commented May 12, 2026

Uh oh!

VishalX commented May 24, 2026

Uh oh!

devang-ml commented May 26, 2026

Uh oh!

tanzeel-amd commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

tanzeel-amd commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tanzeel-amd commented May 8, 2026

Uh oh!

VishalX commented May 11, 2026

Uh oh!

xieofxie commented May 12, 2026

Uh oh!

VishalX commented May 24, 2026

Uh oh!

devang-ml commented May 26, 2026

Uh oh!

tanzeel-amd commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tanzeel-amd commented May 8, 2026 •

edited

Loading