Add Qwen3 AutoParallel model and examples by AlbedoWang · Pull Request #482 · meta-pytorch/autoparallel

AlbedoWang · 2026-06-07T17:47:45Z

Summary

Adds a Qwen3 reference model (dense + MoE) under autoparallel/_testing/models/, runnable examples, and unit tests. The MoE variant wraps expert dispatch in local_map so the solver treats the expert computation as a single sharded node.

What's added

autoparallel/_testing/models/qwen3.py — Qwen3 Transformer (QK-norm attention, RoPE via a precomputed cos/sin buffer, optional weight tying, MoE block with local_map-wrapped experts), Qwen3ModelArgs, debug configs, and qwen3_args_from_torchtitan_config for parity with torchtitan.
examples/example_qwen3.py, example_sanity_check_qwen3.py, example_sanity_check_qwen3_moe.py, example_torchtitan_qwen3_dense.py — end-to-end AutoParallel runs (trace → optimize → apply → forward/backward).
tests/test_qwen3.py, tests/test_dsv3_torchtitan_config.py.
dsv3.py: one-line guard (getattr(..., "use_grouped_mm", True)) so the shared MoE config plumbing tolerates configs without that field.

How to Test

Unit tests: python -m pytest tests/ (uses fake PG, no GPU needed)
Example scripts: python examples/example_qwen3.py (uses fake PG, single GPU for meta-device ops)
MAST validation: launch a MAST job with the Qwen3 config to verify real distributed training works

Success Criteria

example_qwen3.py runs successfully end-to-end (tracing, optimization, apply sharding, forward + backward)
MoE variant with local_map wraps expert dispatch correctly and the solver handles the local_map node (covered by test_qwen3_moe_auto_parallel_smoke)
All unit tests pass (python -m pytest tests/) — qwen3/dsv3 suites: 7 passed, 4 skipped (the skips depend on a torchtitan sibling checkout)
MAST job launches and runs successfully with the Qwen3 model

Test coverage (`test_qwen3.py`)

Forward shape, QK-norm effect, weight-tying survival through init_weights, dense/MoE shape parity with torchtitan debug args, torchtitan config parsing (skips without sibling checkout), RoPE cos/sin parity, dense AutoParallel pipeline smoke, and MoE auto_parallel smoke (param count/shapes + forward/backward).

Copilot

Pull request overview

Adds a Qwen3 reference implementation (dense + MoE) into the repo’s _testing/models suite to exercise AutoParallel’s tracing/solver on modern transformer + MoE patterns (including a local_map-wrapped expert region), along with runnable examples and unit tests. Also includes a small compatibility tweak in the existing DeepSeek-V3 model config plumbing for TorchTitan configs that omit use_grouped_mm.

Changes:

Add autoparallel/_testing/models/qwen3.py: Qwen3 Transformer with QK-norm attention, RoPE via precomputed cos/sin cache, optional weight tying, and MoE expert dispatch wrapped in local_map.
Add end-to-end example scripts for fake-PG smoke runs and real distributed sanity checks (dense + MoE), plus a TorchTitan dense Qwen3 example.
Add unit tests covering Qwen3 basics, TorchTitan parity checks (skipped when sibling checkout is absent), and AutoParallel smoke pipelines; add a DSv3 TorchTitan config compatibility test; add a DSv3 config guard for missing use_grouped_mm.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`autoparallel/_testing/models/qwen3.py`	New Qwen3 reference model (dense + MoE) designed to be traceable/optimizable by AutoParallel, including `local_map` expert region.
`autoparallel/_testing/models/dsv3.py`	Makes MoE config parsing tolerant of TorchTitan configs that don’t define `experts.use_grouped_mm`.
`examples/example_qwen3.py`	Fake-PG example that traces/optimizes/applies Qwen3 (dense or MoE) and optionally runs fwd/bwd.
`examples/example_sanity_check_qwen3.py`	Real-GPU distributed sanity training loop for Qwen3 8B under AutoParallel.
`examples/example_sanity_check_qwen3_moe.py`	Real-GPU distributed sanity training loop for Qwen3 MoE under AutoParallel (EP mesh) with chunked vocab-parallel loss.
`examples/example_torchtitan_qwen3_dense.py`	Runs TorchTitan’s dense Qwen3 through AutoParallel placement on real GPUs.
`tests/test_qwen3.py`	Unit tests for Qwen3 forward shape, RoPE parity, debug args parity, and AutoParallel smoke tests (dense + MoE).
`tests/test_dsv3_torchtitan_config.py`	Verifies DSv3 accepts a TorchTitan grouped-experts config (skips without sibling TorchTitan checkout).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        # Annotate as plain tensors: parameters() yields Parameter, but
+        # _to_activation_device returns Tensor, and we reassign in place.
+        experts_w1: torch.Tensor
+        experts_w2: torch.Tensor
+        experts_w3: torch.Tensor
+        experts_w1, experts_w2, experts_w3 = self.experts.parameters()
+        experts_w1 = _to_activation_device(experts_w1, x)
+        experts_w2 = _to_activation_device(experts_w2, x)
+        experts_w3 = _to_activation_device(experts_w3, x)


+_add_sibling_torchtitan_to_path()
+
+from torchtitan.models.qwen3 import Qwen3Model, qwen3_configs  # noqa: E402


+    torch.manual_seed(args.seed)
+    model_args = make_model_args(args.flavor, args.seq_len)
+    if args.seq_len is None:
+        args.seq_len = model_args.max_seq_len


meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 7, 2026

Add Qwen3 AutoParallel model and examples

15da60f

AlbedoWang force-pushed the kaijian/add_qwen3_pr branch from 6d0e1e5 to 15da60f Compare June 7, 2026 18:36

AlbedoWang requested review from Copilot and sanketpurandare June 8, 2026 21:10

Copilot started reviewing on behalf of AlbedoWang June 8, 2026 21:10 View session

Copilot AI reviewed Jun 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3 AutoParallel model and examples#482

Add Qwen3 AutoParallel model and examples#482
AlbedoWang wants to merge 1 commit into
mainfrom
kaijian/add_qwen3_pr

AlbedoWang commented Jun 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		_add_sibling_torchtitan_to_path()

		from torchtitan.models.qwen3 import Qwen3Model, qwen3_configs # noqa: E402

Conversation

AlbedoWang commented Jun 7, 2026

Summary

What's added

How to Test

Success Criteria

Test coverage (test_qwen3.py)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Test coverage (`test_qwen3.py`)