[ET-VK][matmul] Re-implement fp32/fp16 matmul and linear with tiled compute and blocked weight packing #10118
| Job | Run time |
|---|---|
| 8m 40s | |
| 5m 11s | |
| 7m 21s | |
| 7m 55s | |
| 9m 36s | |
| 7m 24s | |
| 7m 58s | |
| 14m 24s | |
| 22m 0s | |
| 9m 32s | |
| 10m 38s | |
| 4m 56s | |
| 5m 55s | |
| 4m 47s | |
| 5m 54s | |
| 7m 25s | |
| 9m 44s | |
| 5m 53s | |
| 5m 29s | |
| 5m 55s | |
| 2h 46m 37s |