Hello!
I am just wondering if it is a fair comparison when I use different decoding strategies.
On your leaderboard, the performance of the current SOTA model (Champagne) noted that they used the beam search for generating 5 inferences.
But, since the evaluation results are dependent on what strategy is used, I don't think it is fair.
Also, If this kind of comparison is possible, then why you didn't report the higher scores?
Thanks.
Hello!
I am just wondering if it is a fair comparison when I use different decoding strategies.
On your leaderboard, the performance of the current SOTA model (Champagne) noted that they used the beam search for generating 5 inferences.
But, since the evaluation results are dependent on what strategy is used, I don't think it is fair.
Also, If this kind of comparison is possible, then why you didn't report the higher scores?
Thanks.