Skip to content

Commit 1e85a71

Browse files
author
ssjia
committed
[ET-VK][matmul] Re-implement fp32/fp16 matmul and linear with tiled compute and blocked weight packing
Replace all existing matmul/linear operator implementations with new ones built from the ground up using a tiled compute approach. Delete all legacy implementations (MatMulLegacy.cpp, LinearLegacy.cpp, addmm_optimized.glsl, addmm_naive_*.glsl). New matmul (mm/bmm/addmm): - Single matmul.glsl shader handles mm, bmm, and addmm using FPInputTile, FPWeightTile, FPOutTile infrastructure from SDPA - Adaptive tile size selection (TILE_M=4/2/1) based on GPU occupancy - When mat2 is a constant tensor, automatically routes through the linear path for blocked weight packing New linear: - Custom 4OC×4IC blocked weight prepacking via pack_fp_linear_weight.glsl for optimal cache line utilization during tiled matmul - Supports both transposed [N,K] and non-transposed [K,N] weights with batch dimension support - Separate texture2d weight storage with automatic buffer fallback for large dimensions Performance on Adreno 750 (fp16, vs legacy): - Linear [4096,1024]x[256,1024]: 1.33x faster (texture) - Linear [4096,64]x[128,64]: 2.67x faster (texture) - BMM [1,4096,256]x[1,256,1024]: 1.63x faster (texture) Differential Revision: [D96488384](https://our.internmc.facebook.com/intern/diff/D96488384/) [ghstack-poisoned]
1 parent cc27e6b commit 1e85a71

28 files changed

Lines changed: 2278 additions & 1314 deletions

backends/vulkan/runtime/graph/ops/glsl/addmm_naive_buffer.glsl

Lines changed: 0 additions & 86 deletions
This file was deleted.

backends/vulkan/runtime/graph/ops/glsl/addmm_naive_texture3d.glsl

Lines changed: 0 additions & 189 deletions
This file was deleted.

backends/vulkan/runtime/graph/ops/glsl/addmm_naive_texture3d.yaml

Lines changed: 0 additions & 24 deletions
This file was deleted.

0 commit comments

Comments
 (0)