Skip to content

fix(sm100): make arch check robust against CUTLASS DSL Arch enum changes#2575

Closed
lingolin128 wants to merge 1 commit into
Dao-AILab:mainfrom
lingolin128:fix_sm103a_bug
Closed

fix(sm100): make arch check robust against CUTLASS DSL Arch enum changes#2575
lingolin128 wants to merge 1 commit into
Dao-AILab:mainfrom
lingolin128:fix_sm103a_bug

Conversation

@lingolin128
Copy link
Copy Markdown

Summary

The current SM100 forward kernel uses a range-based comparison for the architecture
check:

assert self.arch >= Arch.sm_100 and self.arch <= Arch.sm_110f, \
    "Only SM 10.x and 11.x are supported"

This relies on the ordering of Arch enum values, which is fragile. Specifically, in
nvidia-cutlass-dsl 4.5.0, certain Arch enum entries have unexpected .value tuples,
which can cause this assertion to misbehave on valid SM 10.x architectures (e.g.
SM 10.3a / B300).

Fix

fix: #25564
Replace the range comparison with a major-version check, which is independent of how
the suffix variants (a, f, etc.) are ordered in the enum:

  arch_major = self.arch.value[0]
  assert arch_major in [10, 11], "Only SM 10.x and 11.x are supported"

This continues to gate the kernel to SM 10.x and 11.x as intended, but no longer
depends on the relative ordering of Arch.sm_10* / Arch.sm_11* variants in the
upstream CUTLASS DSL package.

Test

Verified on B300 (SM 10.3a) with nvidia-cutlass-dsl==4.5.0:

  import torch
  from flash_attn.cute import flash_attn_func

  q = torch.randn(1, 128, 4, 128, dtype=torch.bfloat16, device='cuda')
  k = torch.randn(1, 128, 4, 128, dtype=torch.bfloat16, device='cuda')
  v = torch.randn(1, 128, 4, 128, dtype=torch.bfloat16, device='cuda')
  out, lse = flash_attn_func(q, k, v, causal=False)
  # torch.Size([1, 128, 4, 128])

@lingolin128
Copy link
Copy Markdown
Author

@Johnsonms @jayhshah Hi there, please review this PR, really appreciate it!

@janbernloehr
Copy link
Copy Markdown

Seems to be a duplicate of #2572

@lingolin128
Copy link
Copy Markdown
Author

Seems to be a duplicate of #2572

@janbernloehr You're right, I realize this is duplicated with PR #2572. I actually encountered and verified this issue during my practical usage last week, but I didn’t sort it out and submit the PR earlier. I still hope this change can be merged. It’s totally fine if it cannot be merged eventually. Thanks a lot for your review!

@lingolin128
Copy link
Copy Markdown
Author

Regardless, my pull request was submitted afterward. Junrong Lin is an exceptional engineer whom I look up to. Kindly merge his PR #2572 . I will keep contributing to open source and strive for continuous improvement. ^^

@Johnsonms
Copy link
Copy Markdown
Collaborator

Thanks @lingolin128! You’re very welcome to continue contributing to the community.

@Johnsonms Johnsonms closed this May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants