How to measure the loss weight of different pre-training tasks? Which task's loss determines the model training convergence?
How to measure the loss weight of different pre-training tasks? Which task's loss determines the model training convergence?