Thank you for opening source your remarkable work and congrats for the best paper award! Recently I've been reading your paper and trying to run your code, however, I notice you've given a gif of DPD and in the step4 you highlight the output of D2P and the CRINGE Loss, which makes me a little confused: did you compute the CRINGE Loss between the output of D2P and P2D? As far as I know, the CRINGE Loss is calculated between the negative incorrectly predicted tokens and positive tokens sampled from the model's top-k predictions in a contrastive way. Maybe you want to express discouraging the D2P from predicting false protoform with the CRINGE Loss in the step4 of this gif? Anyway, I'm still unsure about the exact meaning of this step. Hoping for your response and answer.
Thank you for opening source your remarkable work and congrats for the best paper award! Recently I've been reading your paper and trying to run your code, however, I notice you've given a gif of DPD and in the step4 you highlight the output of D2P and the CRINGE Loss, which makes me a little confused: did you compute the CRINGE Loss between the output of D2P and P2D? As far as I know, the CRINGE Loss is calculated between the negative incorrectly predicted tokens and positive tokens sampled from the model's top-k predictions in a contrastive way. Maybe you want to express discouraging the D2P from predicting false protoform with the CRINGE Loss in the step4 of this gif? Anyway, I'm still unsure about the exact meaning of this step. Hoping for your response and answer.