Questions Regarding Benchmark Comparisons, FakeClue Dataset Quality, and Request for FakeVLM Model Weights

**Dear Siwei Wen,**

I have a few questions regarding your work, and I would appreciate your insights:

1. **Have you compared your method with more widely recognized benchmarks or training-based methods**, such as the **GenImage test set** or **AIGC Benchmark**?

2. In **Table 2** of the paper, the results show significantly better performance than **GPT-4O**, with **FakeVLM** also performing far better than models like **Qwen2-VL**, **DeepSeek**, and **IntelVL**. The last three LMMS are labeled using **FakeClue**, which raises some concerns about the quality of the **FakeClue dataset**. I examined the **FakeClue test set** and encountered severe quality issues. For example, for images from the **Chameleon dataset** like:
    - **chameleon/fake/762482d5-d381-4586-8964-c3d567e233cf.jpg**
    - **chameleon/fake/f1159a40-3475-4ffc-bd87-7ae9b73e925a.jpg**
    - **chameleon/fake/939a1cf4-ba98-4570-abc7-a7a3f23c6b08.jpg**
    - **chameleon/fake/71ec1b22-945b-4a6c-bfb7-36cb4e70bfa4.jpg**
    - **chameleon/fake/99cd97e4-6c8f-4fe9-a80a-9b8d5f0971e8.jpg**
    - **chameleon/fake/8f05fe0f-3b17-442b-9b8b-cde419445110.jpg**

The explanations provided for these images offer very little information, and despite varying image contents, the reasoning is often generic: "**Despite its seemingly authentic appearance, certain features, such as disproportionate textures or odd lighting, hint that this image was generated by AI.**" For example, for the image **chameleon/fake/492b2e59-5da3-4218-8ff0-ea6283c8bae3.jpg**, the explanation merely repeats the content description, which clearly seems like a stitched-together response from multiple **LMMs**. In contrast, for images from the **GenImage dataset**, the explanations are overly verbose, sometimes exceeding **1500 characters** for a single image. This leads me to a new concern: Given the questionable quality of the annotations, is it meaningful to calculate **ROUGE_L** and **CSS** scores under such conditions?

3. Finally, I would like to request the release of the **FakeVLM model weights** to help clarify these issues. Due to limited computational resources, I might not be able to fully replicate the original training accuracy.

Thank you for your attention to these matters, and I look forward to your responses.

**Best regards**  
**P.S. Thank you for open-sourcing such an interesting piece of work!**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions Regarding Benchmark Comparisons, FakeClue Dataset Quality, and Request for FakeVLM Model Weights #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions Regarding Benchmark Comparisons, FakeClue Dataset Quality, and Request for FakeVLM Model Weights #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions