Add Qwen3.5 MoE hybrid decoder model (back-port from upstream PR #2545) by akeshet · Pull Request #1 · akeshet/torchtitan

akeshet · 2026-04-01T21:53:37Z

Back-ports the Qwen3.5 MoE model (GatedDeltaNet + full attention + MoE) from pytorch#2545 to v0.2.0 APIs. Core model math preserved; registration/parallelization adapted to TrainSpec/BaseModelArgs/ModelProtocol.

Includes: model definition, parallelization (TP/EP/FSDP with DTensor-safe wrappers for GatedDeltaNet), HF state dict adapter, 5 model flavors (debugmodel through 397B), and debug training config.

…rch#2545) Back-ports the Qwen3.5 MoE model (GatedDeltaNet + full attention + MoE) from pytorch#2545 to v0.2.0 APIs. Core model math preserved; registration/parallelization adapted to TrainSpec/BaseModelArgs/ModelProtocol. Includes: model definition, parallelization (TP/EP/FSDP with DTensor-safe wrappers for GatedDeltaNet), HF state dict adapter, 5 model flavors (debugmodel through 397B), and debug training config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

v0.2.0 exports the class as `Compile`, not `CompileConfig`. Remove the direct import since we only pass `job_config.compile` through to llama4's `apply_compile` which handles the alias internally. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

akeshet and others added 2 commits April 1, 2026 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3.5 MoE hybrid decoder model (back-port from upstream PR #2545)#1

Add Qwen3.5 MoE hybrid decoder model (back-port from upstream PR #2545)#1
akeshet wants to merge 2 commits into
v0.2.0-with-positionsfrom
add-qwen3.5-moe

akeshet commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

akeshet commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant