fix(sm100): make arch check robust against CUTLASS DSL Arch enum changes#2575
fix(sm100): make arch check robust against CUTLASS DSL Arch enum changes#2575lingolin128 wants to merge 1 commit into
Conversation
|
@Johnsonms @jayhshah Hi there, please review this PR, really appreciate it! |
|
Seems to be a duplicate of #2572 |
@janbernloehr You're right, I realize this is duplicated with PR #2572. I actually encountered and verified this issue during my practical usage last week, but I didn’t sort it out and submit the PR earlier. I still hope this change can be merged. It’s totally fine if it cannot be merged eventually. Thanks a lot for your review! |
|
Regardless, my pull request was submitted afterward. Junrong Lin is an exceptional engineer whom I look up to. Kindly merge his PR #2572 . I will keep contributing to open source and strive for continuous improvement. ^^ |
|
Thanks @lingolin128! You’re very welcome to continue contributing to the community. |
Summary
The current SM100 forward kernel uses a range-based comparison for the architecture
check:
This relies on the ordering of Arch enum values, which is fragile. Specifically, in
nvidia-cutlass-dsl 4.5.0, certain Arch enum entries have unexpected .value tuples,
which can cause this assertion to misbehave on valid SM 10.x architectures (e.g.
SM 10.3a / B300).
Fix
fix: #25564
Replace the range comparison with a major-version check, which is independent of how
the suffix variants (a, f, etc.) are ordered in the enum:
This continues to gate the kernel to SM 10.x and 11.x as intended, but no longer
depends on the relative ordering of Arch.sm_10* / Arch.sm_11* variants in the
upstream CUTLASS DSL package.
Test
Verified on B300 (SM 10.3a) with nvidia-cutlass-dsl==4.5.0: