Skip to content

Evaluation reproducing issues #12

@ashmalvayani

Description

@ashmalvayani

Thanks for the great work. I'm trying to reproduce the results and facing following errors:

  1. Can I use lm-evaluation-harness script instead of yours to evaluate the results? When I used lm-harness ammlu dataset, I got 34.1 accuracy as compared to yours 37. What could be the difference?

  2. How to use this script for another model's evaluation?
    i. When I changed the model to jais-13b it gave 0% accuracy on Ammlu (all the responses are empty string).
    ii. On any other model such as Phi-2, MobiLlama-1B, I am getting the following error:

image

below are the changes I made to config.yaml:
image

and in ArabicMMLU_few_shots.sh, I changed the model id to Phi-2B-base. Can you please tell me the solution of this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions