Skip to content

Per-Instance Operator Optimization #1

@TheJoshBrod

Description

@TheJoshBrod

Current optimization methods involve optimizing the op as a generic primitive for a given hardware architecture.

Example:

class TwoLayerLinear(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        x = self.fc1(x)
        x = self.fc2(x)
        return x

With the above model we have two "instances" of linear functions that have different input and output sizes, however the current system makes the assumption it should make a "catch-all" linear function kernel optimized for the hardware. If we split this into two separate kernels, we can do finer optimization.

THEREFORE

Optimize for fc1 and fc2 as separate kernels, rather than just having one generic linear layer


**The above example is intentionally very simple. In general the effect explained above is strong for many operators with many static hyperparameters (ex. convolutions with fixed kernel size, stride, and padding). Collapsing all instances into a single generic kernel sacrifices optimization opportunities that could be unlocked by instance-specific kernel generation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions