Per-Instance Operator Optimization

Current optimization methods involve optimizing the op as a generic primitive for a given hardware architecture. 

Example:

```python
class TwoLayerLinear(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        x = self.fc1(x)
        x = self.fc2(x)
        return x
```

With the above model we have two "instances" of linear functions that have different input and output sizes, however the current system makes the assumption it should make a "catch-all" linear function kernel optimized for the hardware. If we split this into two separate kernels, we can do finer optimization.

THEREFORE

Optimize for fc1 and fc2 as separate kernels, rather than just having one generic linear layer

---

**The above example is intentionally very simple. In general the effect explained above is strong for many operators with many static hyperparameters (ex. convolutions with fixed kernel size, stride, and padding). Collapsing all instances into a single generic kernel sacrifices optimization opportunities that could be unlocked by instance-specific kernel generation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per-Instance Operator Optimization #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Per-Instance Operator Optimization #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions