Benchmark results

Hi,

Thank you for open-sourcing the evaluation data! Could you please also release the benchmark results for the different models you compared? Specifically, the values shown in Figure 2 as well as the results for tasks not included in that figure.

Thanks in advance!