Fused Add + RMSNorm pattern

Motivation:

Qwen3 Decoder Layer has the following pattern in the forward pass:

```python
hidden_states = residual + hidden_states

# Fully Connected
residual = hidden_states
hidden_states = self.post_attention_layernorm(hidden_states)
```

https://github.com/huggingface/transformers/blob/0dc2df5ddafe3cb5824ad24e85beba13e0aa6726/src/transformers/models/qwen3/modeling_qwen3.py#L271

Goal:

Design a fused cutlass kernel that performs the above pattern.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fused Add + RMSNorm pattern #56

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Fused Add + RMSNorm pattern #56

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions