I add distillation when training resnet18. But the Top-1 Acc degrades from 68.150 % to 67.364%。
Hyperparameters as follow:
4gpu
epochs: 90
learning_rate: 0.01
momentum: 0.9
weight_decay: 0.0001
mode: step
step_size: 20
gamma: 0.1
loss = ce_loss + 300 * distill_loss
I add distillation when training resnet18. But the Top-1 Acc degrades from 68.150 % to 67.364%。
Hyperparameters as follow:
4gpu
epochs: 90
learning_rate: 0.01
momentum: 0.9
weight_decay: 0.0001
mode: step
step_size: 20
gamma: 0.1
loss = ce_loss + 300 * distill_loss