Skip to content

experimental: grouped gemm AOT#292

Open
saltyminty wants to merge 1 commit into
NVIDIA:developfrom
saltyminty:fix/mingyangw/investigate-tvm-ffi-registry-cutedsl-aot
Open

experimental: grouped gemm AOT#292
saltyminty wants to merge 1 commit into
NVIDIA:developfrom
saltyminty:fix/mingyangw/investigate-tvm-ffi-registry-cutedsl-aot

Conversation

@saltyminty

Copy link
Copy Markdown
Collaborator

Summary

  • Add CuTe DSL AOT export/load helpers and artifact metadata handling.
  • Wire CUDNN_FE_AOT_MODE / CUDNN_FE_AOT_DIR into GemmAmax and grouped GEMM wrapper paths.
  • Add AOT export/load coverage for GemmAmax and grouped GEMM SM100 APIs, including C++ dlopen/dlsym smoke coverage.

Validation

  • Targeted SM100 pytest for GemmAmax AOT export/load and C++ smoke: passed.
  • Targeted SM100 grouped GEMM dense/discrete AOT export/load regression: passed, 8 passed in 38.24s.
  • DeepSeek DSv3 cudnn_jit_repro in Dockerfile-built container:
    • CUDNN_FE_AOT_MODE=write: passed, exported 6 .json, 6 .o, and 6 .so artifacts.
    • CUDNN_FE_AOT_MODE=read: passed, 0 cute.compile lines and completed from AOT artifacts.
    • CUDNN_FE_AOT_MODE=readwrite: passed, first process exported artifacts; second process emitted 0 cute.compile lines and loaded artifacts.

Notes

  • The DeepSeek validation used the Dockerfile-built DSv3 container with local cudnn-frontend installed from this branch because the prebuilt NGC image required SSO access that was unavailable during setup.

@Anerudhan Anerudhan added cat-feature Requests for new functionality, APIs, examples, or behavior improvements. mod-frontend cuDNN frontend APIs, operation graph construction, plans, and user-facing wrappers. orig-nv-eng Reported or requested by NVIDIA engineering. labels Jun 9, 2026
@Anerudhan Anerudhan added this to the Frontend 1.26.0 milestone Jun 9, 2026
@Anerudhan

Copy link
Copy Markdown
Collaborator

Lets wait till 1.25 branch cut happens before merging this in.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat-feature Requests for new functionality, APIs, examples, or behavior improvements. mod-frontend cuDNN frontend APIs, operation graph construction, plans, and user-facing wrappers. orig-nv-eng Reported or requested by NVIDIA engineering.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants