Hello,
I'm currently trying to use the grouped gemm code in my project, but I've noticed that in every iteration, workspace is initialized (based on torch::Tensor workspace = torch::empty(workspace_size, options)); that seems unnecessary?
Because cutlass's workspace is reuseable. And it seems to affect performance when used frequently, such as in many MoE layers, or when the MxNxK is large. Has anyone tested the effects of this?
Hello,
I'm currently trying to use the grouped gemm code in my project, but I've noticed that in every iteration, workspace is initialized (based on
torch::Tensor workspace = torch::empty(workspace_size, options)); that seems unnecessary?Because cutlass's workspace is reuseable. And it seems to affect performance when used frequently, such as in many MoE layers, or when the MxNxK is large. Has anyone tested the effects of this?