Thank you for your impressive work and for releasing the evaluation code!
I am writing to inquire about the MoGe-2 evaluation results reported in your paper. I noticed a significant performance gap between my local benchmarks and the values reported in the Lotus-2 paper, even though I followed the evaluation protocols mentioned in your repository.
1. Performance Discrepancy on KITTI
When evaluating the official MoGe-2 (ViT-L) checkpoint, my results are significantly better than those reported in the paper. For example, on KITTI:
| Benchmark (KITTI) |
AbsRel ↓ |
$\delta_1$ ↑ |
| Lotus-2 Paper |
11.8 |
89.2 |
| My Local Test |
5.5 |
97.7 |
My Evaluation Details:
- Checkpoint: Official moge-2-vitl.
- Alignment: Standard
least_square alignment in linear space.
- Resolution Strategy: I resized all inputs to 3600 tokens (the maximum number of tokens in MoGe-2's official inference code) for prediction, and then resized the output back to the original resolution for computing metrics.
- Masking: I set
apply_mask=False during inference, following the official MoGe-2 implementation.
Could you please clarify or share the evaluation settings/code used for the MoGe-2 baseline in your paper?
2. Consistency on ETH3D and DIODE
I also observed that the Depth Anything V2 (DAv2) results on ETH3D in your paper match the values reported in the original DAv2 paper exactly, but the results on DIODE appear to be different.
I was wondering:
- Did you reproduce the DAv2 results on ETH3D locally, or were they cited directly from the DAv2 paper?
- If the DIODE results were reproduced locally, could you specify the settings used, as it would help me align my environment for a fair comparison?
I am keen to ensure my benchmarks are aligned with your protocol to maintain a fair comparison in my research. Thank you again for your contribution to the community!
Thank you for your impressive work and for releasing the evaluation code!
I am writing to inquire about the MoGe-2 evaluation results reported in your paper. I noticed a significant performance gap between my local benchmarks and the values reported in the Lotus-2 paper, even though I followed the evaluation protocols mentioned in your repository.
1. Performance Discrepancy on KITTI
When evaluating the official MoGe-2 (ViT-L) checkpoint, my results are significantly better than those reported in the paper. For example, on KITTI:
My Evaluation Details:
least_squarealignment in linear space.apply_mask=Falseduring inference, following the official MoGe-2 implementation.Could you please clarify or share the evaluation settings/code used for the MoGe-2 baseline in your paper?
2. Consistency on ETH3D and DIODE
I also observed that the Depth Anything V2 (DAv2) results on ETH3D in your paper match the values reported in the original DAv2 paper exactly, but the results on DIODE appear to be different.
I was wondering:
I am keen to ensure my benchmarks are aligned with your protocol to maintain a fair comparison in my research. Thank you again for your contribution to the community!