I don't quite understand how the model enforces boundary probability to be close to 1 or 0. It is possible for the model to be less confident during the EMA phase so that it can accumulate more information from previous steps? I have done some training and observe that boundaries have probabilities close to 0.5. Is it a data issue or there's some trick to let the model learn more confident probabilities? I am doing one layer hnet by the way.
I don't quite understand how the model enforces boundary probability to be close to 1 or 0. It is possible for the model to be less confident during the EMA phase so that it can accumulate more information from previous steps? I have done some training and observe that boundaries have probabilities close to 0.5. Is it a data issue or there's some trick to let the model learn more confident probabilities? I am doing one layer hnet by the way.