https://github.com/Dao-AILab/sonic-moe/
-
The current API can only be used without EP, so integrate it for non EP paths.
-
Investigate whether a grouped gemm from sonicmoe could be used with EP. The grouped gemm function would need to implement the following for best perf:
- gemm_gated followed by router weight multiplication in the down projection gemm epilogue similar to gemm_dgated
- Custom backward for the two gemms
Constraints:
Tokens are already permuted for EP so only need the grouped gemm portion from sonic moe, but that API is not yet available.
https://github.com/Dao-AILab/sonic-moe/
The current API can only be used without EP, so integrate it for non EP paths.
Investigate whether a grouped gemm from sonicmoe could be used with EP. The grouped gemm function would need to implement the following for best perf:
Constraints:
Tokens are already permuted for EP so only need the grouped gemm portion from sonic moe, but that API is not yet available.