Skip to content

fix(vllm): prevent Ultra DS-tail copy offset overflow#10552

Open
sungsooha wants to merge 1 commit into
ai-dynamo:release/1.3.0-nemotron-ultra-dev.1from
sungsooha:sungsooha/nemotron-ultra-ds-tail-i64-fix-release
Open

fix(vllm): prevent Ultra DS-tail copy offset overflow#10552
sungsooha wants to merge 1 commit into
ai-dynamo:release/1.3.0-nemotron-ultra-dev.1from
sungsooha:sungsooha/nemotron-ultra-ds-tail-i64-fix-release

Conversation

@sungsooha

@sungsooha sungsooha commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Overview:

Fix a Nemotron Ultra DS-layout Mamba conv-state tail-copy overflow that can surface as CUDA illegal memory access under GB200 long-context MTP traffic.

The failing path is specific to prefix cache ON + VLLM_SSM_CONV_STATE_LAYOUT=DS + --mamba-cache-mode align + MTP. The DS-tail Triton copy kernel can compute source/destination element offsets above signed 32-bit range; this PR keeps that pointer-offset arithmetic in tl.int64.

Details:

  • Adds 0005-ultra-ds-tail-copy-i64-offsets.patch.
  • Updates ds_conv_tail_copy_kernel so rows/cols and block/stride/offset operands are converted to tl.int64 before pointer-offset multiplication/addition.
  • Preserves existing copy semantics, masks, source/destination block ids, and load/store shape.
  • Adds runtime validation marker NEMOTRON_ULTRA_DS_TAIL_I64_OFFSET_FIX under the existing mtp_ds_copy marker group.

Validation evidence:

  • Original prod DS-tail kernel reproduced the hard stop with valid DS-tail copies and destination offsets above 2^31:
    • dst_over2G_count=37
    • max_dst_elem_offset=2679660544
    • direct hard stop in ds_conv_tail_copy_kernel
  • i64 candidate completed the same risky GB200 c16 sticky long-context condition cleanly:
    • ds_tail_count=128
    • dst_over2G_count=48
    • max_dst_elem_offset=2840651774
    • hard stops: 0

Where should the reviewer start?

  • container/deps/vllm/patches/v0.22.0/ultra/0005-ultra-ds-tail-copy-i64-offsets.patch
  • container/deps/vllm/validate_nemotron_ultra_runtime.py

Related Issues

⚠️ This section is required. Choose one path below and delete the other.

🔗 This PR is linked to an issue:

  • Closes #XXXX

🚫 This PR is NOT linked to an issue:

  • Confirmed — no related issue

Open in Devin Review

@sungsooha sungsooha requested review from a team as code owners June 10, 2026 19:52
@copy-pr-bot

copy-pr-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions

Copy link
Copy Markdown
Contributor

👋 Hi sungsooha! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions Bot added external-contribution Pull request is from an external contributor backend::vllm Relates to the vllm backend container labels Jun 10, 2026

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 2 additional findings in Devin Review.

Open in Devin Review

@sungsooha sungsooha force-pushed the sungsooha/nemotron-ultra-ds-tail-i64-fix-release branch from 443c1e6 to 6a82e45 Compare June 10, 2026 20:02
Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com>
(cherry picked from commit ae942ac)
@sungsooha sungsooha force-pushed the sungsooha/nemotron-ultra-ds-tail-i64-fix-release branch from 6a82e45 to 8c5667d Compare June 10, 2026 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::vllm Relates to the vllm backend container external-contribution Pull request is from an external contributor fix size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant