Potential error in eval_gsm8k.py

Dear authors, thank you for the amazing work and sharing your code and data!

I wanted to ask about your evaluation code, as currently if the model outputs an answer with decimal point, it automatically rounds to the nearest integer. 

In this way, a wrong answer (i.e. 8.5) could be considered correct (i.e. as 9), in spite of a calculation error, which indeed often occurs with some model generations. 

In this light, I believe a stricter evaluation code may be needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Potential error in eval_gsm8k.py #23

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Potential error in eval_gsm8k.py #23

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions