Dear @feixue94 ,
We are working on a related project and we are unable to reproduce simple baselines such as DINOv2+NN or DIFT without the promised code release due to missing details on your setup.
For instance, in the temporal Matching benchmark:
-
Image pairs: Did you build pairs by matching the first frame of each video with all others, or did you use all possible consecutive (or all possible) frame pairs?
-
Occlusion handling: How were occluded points treated in the evaluation? Were they discarded or interpolated?
-
Metric averaging: Is the reported PCK metric averaged per video or per pair of frames?
Without the implementation or clear details of the evaluation protocol, it is extremely challenging for us to move forward and ensure fair comparisons.
Dear @feixue94 ,
We are working on a related project and we are unable to reproduce simple baselines such as DINOv2+NN or DIFT without the promised code release due to missing details on your setup.
For instance, in the temporal Matching benchmark:
Image pairs: Did you build pairs by matching the first frame of each video with all others, or did you use all possible consecutive (or all possible) frame pairs?
Occlusion handling: How were occluded points treated in the evaluation? Were they discarded or interpolated?
Metric averaging: Is the reported PCK metric averaged per video or per pair of frames?
Without the implementation or clear details of the evaluation protocol, it is extremely challenging for us to move forward and ensure fair comparisons.