Possible Unfair Comparison with LLaVA Benchmark Results

We would like to express concern regarding the fairness of the comparisons in the experiment results. Both LLaVA and ASVR yield results that are lower than those reported in the original LLaVA paper. We ran the authors' code and also tested a version without the image loss, keeping all other settings identical (i.e., the LLaVA version). We used the datasets specified in the paper for both pretraining and finetuning. Our findings show that, in benchmark performance, the version without image loss outperforms the version with image loss (ASVR) in nearly every case. The ASVR test results align with those reported in the paper, while the LLaVA results are significantly higher than those presented in the original paper.

As a result, we question the fairness of the comparison involving LLaVA in the paper. Below are some of our test results:


| model  | gqa      | vizwiz   | scienceq  | textvqa  | pope     | mme      |
|--------|----------|----------|-----------|----------|----------|----------|
| llava  | 62.13229 | 56.28155 | 70.79822  | 59.044   | 87.55556 | 1441.554 |
| asvr   | 60.50246 | 58.93031 | 69.06296  | 54.038   | 86.67778 | 1429.783 |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible Unfair Comparison with LLaVA Benchmark Results #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

model	gqa	vizwiz	scienceq	textvqa	pope	mme
llava	62.13229	56.28155	70.79822	59.044	87.55556	1441.554
asvr	60.50246	58.93031	69.06296	54.038	86.67778	1429.783

Possible Unfair Comparison with LLaVA Benchmark Results #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions