Hi, I am trainng with Adamw optimizer with following parameter for some time, and the loss value shows bump at the start of each epoch, what is the reason for this?
num_epoch: 36
optimizer: 'adamw'
lr: 0.00002
momentum: 0.9
weight_decay: 0.001
scheduler: 'cosine'
filter_bias_and_bn: true
warmup_epoch: 0
max_grad_norm: 1.0
lr_milestones: []

Hi, I am trainng with Adamw optimizer with following parameter for some time, and the loss value shows bump at the start of each epoch, what is the reason for this?
num_epoch: 36
optimizer: 'adamw'
lr: 0.00002
momentum: 0.9
weight_decay: 0.001
scheduler: 'cosine'
filter_bias_and_bn: true
warmup_epoch: 0
max_grad_norm: 1.0
lr_milestones: []