[ET-VK][matmul] Re-implement fp32/fp16 matmul and linear with tiled compute and blocked weight packing #10155
| Job | Run time |
|---|---|
| 6m 9s | |
| 9m 34s | |
| 8m 50s | |
| 8m 14s | |
| 15m 35s | |
| 10m 21s | |
| 8m 30s | |
| 10m 4s | |
| 10m 24s | |
| 22m 16s | |
| 7m 23s | |
| 7m 28s | |
| 5m 52s | |
| 5m 47s | |
| 4m 59s | |
| 9m 48s | |
| 5m 50s | |
| 4m 45s | |
| 6m 32s | |
| 5m 38s | |
| 2h 53m 59s |