Skip to content

Possible Unfair Comparison with LLaVA Benchmark Results #5

@nssmd

Description

@nssmd

We would like to express concern regarding the fairness of the comparisons in the experiment results. Both LLaVA and ASVR yield results that are lower than those reported in the original LLaVA paper. We ran the authors' code and also tested a version without the image loss, keeping all other settings identical (i.e., the LLaVA version). We used the datasets specified in the paper for both pretraining and finetuning. Our findings show that, in benchmark performance, the version without image loss outperforms the version with image loss (ASVR) in nearly every case. The ASVR test results align with those reported in the paper, while the LLaVA results are significantly higher than those presented in the original paper.

As a result, we question the fairness of the comparison involving LLaVA in the paper. Below are some of our test results:

model gqa vizwiz scienceq textvqa pope mme
llava 62.13229 56.28155 70.79822 59.044 87.55556 1441.554
asvr 60.50246 58.93031 69.06296 54.038 86.67778 1429.783

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions