Hi, how is the temperature parameter in the haversine loss function chosen? Does it have a relationship with the number of geocell classes being predicted?
At the beginning of training, I am observing non-normal haversine loss values such as -0.0, when predicting over 20000 geocells, but not when predicting 2000 geocells. I have tried cross entropy loss without haversine smoothing with the same scenario, and observe loss values in a more normal range.
Thanks!
Hi, how is the temperature parameter in the haversine loss function chosen? Does it have a relationship with the number of geocell classes being predicted?
At the beginning of training, I am observing non-normal haversine loss values such as -0.0, when predicting over 20000 geocells, but not when predicting 2000 geocells. I have tried cross entropy loss without haversine smoothing with the same scenario, and observe loss values in a more normal range.
Thanks!