Hi there,
I read you paper, notice that you mention the RMSNorm normalizing function is used in your code, but I can't find any class's forward is called which has the RMSNorm normalizing function in generation. Could you tell me where to use it? is it traning only?
Hi there,
I read you paper, notice that you mention the RMSNorm normalizing function is used in your code, but I can't find any class's forward is called which has the RMSNorm normalizing function in generation. Could you tell me where to use it? is it traning only?