Optimize matmul #5

Open

Labels

opened

on Nov 29, 2023

Using (AxB)[0][0] = A[0][i]*B[0][i] instead (AxB)[0][0] = A[0][i]*B[i][0] might improve cache locality. I guess that's why pytorch.nn.Linear use transposed weight.

Metadata

Assignees

No one assigned

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests