I reproduced COA with gpt4o and the results were much lower than in the paper.
When I reproduced it, it was 21.69%, but in Table 2 of the paper, it was 62.8%. Below are the results I got.
{
'total': {'hit_rate': 0.9954, 'type_acc': 0.5388, 'exact_acc': 0.2169},
'CLICK': {'type_acc': 0.663, 'exact_acc': 0.1889},
'SCROLL': {'type_acc': 0.4048, 'exact_acc': 0.2381},
'PRESS': {'type_acc': 0.6875, 'exact_acc': 0.5938},
'STOP': {'type_acc': 0.102, 'exact_acc': 0.102},
'TYPE': {'type_acc': 0.2889, 'exact_acc': 0.2222, 'text_dist': 0.845}
}
Is there anything else I need to do in the code?
I reproduced COA with gpt4o and the results were much lower than in the paper.
When I reproduced it, it was 21.69%, but in Table 2 of the paper, it was 62.8%. Below are the results I got.
Is there anything else I need to do in the code?