-
Notifications
You must be signed in to change notification settings - Fork 0
Per-Instance Operator Optimization #1
Description
Current optimization methods involve optimizing the op as a generic primitive for a given hardware architecture.
Example:
class TwoLayerLinear(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super().__init__()
self.fc1 = nn.Linear(input_dim, hidden_dim)
self.fc2 = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
x = self.fc1(x)
x = self.fc2(x)
return xWith the above model we have two "instances" of linear functions that have different input and output sizes, however the current system makes the assumption it should make a "catch-all" linear function kernel optimized for the hardware. If we split this into two separate kernels, we can do finer optimization.
THEREFORE
Optimize for fc1 and fc2 as separate kernels, rather than just having one generic linear layer
**The above example is intentionally very simple. In general the effect explained above is strong for many operators with many static hyperparameters (ex. convolutions with fixed kernel size, stride, and padding). Collapsing all instances into a single generic kernel sacrifices optimization opportunities that could be unlocked by instance-specific kernel generation.