Skip to content

RMSNorm - Pre, Within, or Post Norm? #3

@daniel-deychakiwsky

Description

@daniel-deychakiwsky

Hey! Enjoyed reading your paper today and congrats on such amazing work. I was wondering if you could help me understand confusion I have between your paper's Section's 3 and 4 compared to your GitHub implementation and the existing PyTorch implementation. It seems to me the Github's PyTorch and the official PyTorch implementation's operate on a given input agnostic of the input allowing the developer to place the RMSNorm layer after a Linear layer or after a Linear Layer + Relu. However, in the paper it seems the normalization is happening "within" a Linear layer: "... denotes the weight-summed inputs to neurons, which is also the target of normalization." Any chance you could clear that confusion for me? I suppose one way to think about it is to consider what happens when we consider the identity matrix as the weight matrix and a zero bias vector in Section 3.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions