Hi, thanks for your great job!
I'm a novice in the field of model inference acceleration. I tested the precision performance of your code on several datasets, and it turned out that your method performed well. However, I don't quite understand how to conducted the speed tests, such as the overall acceleration and prefill phase acceleration stated in your paper.
I would be most grateful if you could offer some guidance.
Hi, thanks for your great job!
I'm a novice in the field of model inference acceleration. I tested the precision performance of your code on several datasets, and it turned out that your method performed well. However, I don't quite understand how to conducted the speed tests, such as the overall acceleration and prefill phase acceleration stated in your paper.
I would be most grateful if you could offer some guidance.