Inconsistency in evaluation results on KITTI/ETH3D benchmarks

Thank you for your impressive work and for releasing the evaluation code!

I am writing to inquire about the MoGe-2 evaluation results reported in your paper. I noticed a significant performance gap between my local benchmarks and the values reported in the Lotus-2 paper, even though I followed the evaluation protocols mentioned in your repository.

#### 1. Performance Discrepancy on KITTI
When evaluating the official MoGe-2 (ViT-L) checkpoint, my results are significantly better than those reported in the paper. For example, on KITTI:

| Benchmark (KITTI) | AbsRel ↓ | $\delta_1$ ↑ |
| :--- | :---: | :---: |
| **Lotus-2 Paper** | 11.8 | 89.2 |
| **My Local Test** | **5.5** | **97.7** |

**My Evaluation Details:**
* **Checkpoint**: Official [moge-2-vitl](https://huggingface.co/Ruicheng/moge-2-vitl).
* **Alignment**: Standard `least_square` alignment in linear space.
* **Resolution Strategy**: I resized all inputs to **3600 tokens** (the maximum number of tokens in MoGe-2's official inference code) for prediction, and then resized the output back to the original resolution for computing metrics.
* **Masking**: I set `apply_mask=False` during inference, following the [official MoGe-2 implementation](https://github.com/microsoft/MoGe/blob/07444410f1e33f402353b99d6ccd26bd31e469e8/moge/model/v2.py#L195).

Could you please clarify or share the evaluation settings/code used for the MoGe-2 baseline in your paper?

#### 2. Consistency on ETH3D and DIODE
I also observed that the Depth Anything V2 (DAv2) results on **ETH3D** in your paper match the values reported in the original DAv2 paper exactly, but the results on **DIODE** appear to be different. 

I was wondering:
1. Did you reproduce the DAv2 results on ETH3D locally, or were they cited directly from the DAv2 paper?
2. If the DIODE results were reproduced locally, could you specify the settings used, as it would help me align my environment for a fair comparison?

I am keen to ensure my benchmarks are aligned with your protocol to maintain a fair comparison in my research. Thank you again for your contribution to the community!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistency in evaluation results on KITTI/ETH3D benchmarks #13

1. Performance Discrepancy on KITTI

2. Consistency on ETH3D and DIODE

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistency in evaluation results on KITTI/ETH3D benchmarks #13

Description

1. Performance Discrepancy on KITTI

2. Consistency on ETH3D and DIODE

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions