System Info
when count flops, embed token should not be added to the whole model flops because nn.Embedding is a table lookup operation rather than a computational operation.
For one token, emd_and_lm_head_N should be vocab_size * hidden_size * 1 * 6 rather than vocab_size * hidden_size * 2 * 6.
I think all the calculations involved in this operation need to be adjusted, and the current results are all inflated.
https://github.com/ByteDance-Seed/VeOmni/blob/main/veomni/utils/count_flops.py#L735
Information
Tasks
Reproduction
NA
Expected behavior
correct flops count
System Info
when count flops, embed token should not be added to the whole model flops because
nn.Embeddingis a table lookup operation rather than a computational operation.For one token,
emd_and_lm_head_Nshould bevocab_size * hidden_size * 1 * 6rather thanvocab_size * hidden_size * 2 * 6.I think all the calculations involved in this operation need to be adjusted, and the current results are all inflated.
https://github.com/ByteDance-Seed/VeOmni/blob/main/veomni/utils/count_flops.py#L735
Information
Tasks
examplesfolder (such as GLUE/SQuAD, ...)Reproduction
NA
Expected behavior
correct flops count