thanks for the great project, just wondering if ANE doesn't have the fused kernel, does the attention layer faster than mlx? my understanding the ANE speed up should be more on the compute side but the mlx fused kernel speed up a lot on the memory part, I haven't get the around the ANE benefits yet. I have tried to follow Anemll's repo's example but I think I missed something which cause the coreml pipeline is 1.4 slower than mlx pipeline..
thanks for the great project, just wondering if ANE doesn't have the fused kernel, does the attention layer faster than mlx? my understanding the ANE speed up should be more on the compute side but the mlx fused kernel speed up a lot on the memory part, I haven't get the around the ANE benefits yet. I have tried to follow Anemll's repo's example but I think I missed something which cause the coreml pipeline is 1.4 slower than mlx pipeline..