Skip to content

Qwen3 5 mtp final 2#326

Open
wanfengcxz wants to merge 11 commits into
DeepLink-org:mainfrom
wanfengcxz:qwen3_5_mtp_final_2
Open

Qwen3 5 mtp final 2#326
wanfengcxz wants to merge 11 commits into
DeepLink-org:mainfrom
wanfengcxz:qwen3_5_mtp_final_2

Conversation

@wanfengcxz
Copy link
Copy Markdown
Collaborator

No description provided.

tangzhiyi11 and others added 3 commits April 18, 2026 11:36
- ascend_cudagraph.py: multi-token decode graph mode support
  (4-tuple graph key with query_len, actual_seq_lengths_q buffers)
- device/__init__.py: add patch_attention_is_tp (draft model TP),
  patch_ray_init (NPU Ray resource), MTP multi-token paths in
  GatedDelta conv1d and sigmoid_gating update kernels

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move the Ascend-specific graph alignment, state replay, and sampling fallback into dlinfer so multi-token speculative decode stays stable without expanding lmdeploy core runtime changes.

Made-with: Cursor
Snapshot only the active state-cache rows during speculative replay so Ascend no longer clones the full state pool for rejection recovery.

Made-with: Cursor
@wanfengcxz wanfengcxz requested a review from jinminxi104 as a code owner April 21, 2026 07:51
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 11, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ wanfengcxz
✅ tangzhiyi11
❌ Super User


Super User seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants