Hello Tri Dao! I sincerely apologize for taking up your time, but I am wondering if there are any plans to support mCuSeqlensM on CPU. The reason is that DeepEP's returned Tensor are always on CPU memory, and copying them to GPU would consume a significant amount of time. Thank you very much for your consideration!
Hello Tri Dao! I sincerely apologize for taking up your time, but I am wondering if there are any plans to support mCuSeqlensM on CPU. The reason is that DeepEP's returned Tensor are always on CPU memory, and copying them to GPU would consume a significant amount of time. Thank you very much for your consideration!