Skip to content

Latest commit

 

History

History
62 lines (59 loc) · 6.06 KB

File metadata and controls

62 lines (59 loc) · 6.06 KB
layout project_uop3d
title Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos
arxiv_pdf https://openaccess.thecvf.com/content/CVPR2024/papers/Sommer_Unsupervised_Learning_of_Category-Level_3D_Pose_from_Object-Centric_Videos_CVPR_2024_paper.pdf
supplementary_material https://openaccess.thecvf.com/content/CVPR2024/supplemental/Sommer_Unsupervised_Learning_of_CVPR_2024_supplemental.pdf
github_link https://github.com/GenIntel/uns-obj-pose3d.git
arxiv_link https://openaccess.thecvf.com/content/CVPR2024/html/Sommer_Unsupervised_Learning_of_Category-Level_3D_Pose_from_Object-Centric_Videos_CVPR_2024_paper.html
teaser_video assets/videos/uop3d/UOP3D_720p.mp4
teaser_video_description ...
abstract Category-level 3D pose estimation is a fundamentally important problem in computer vision and robotics e.g. for embodied agents or to train 3D generative models. However so far methods that estimate the category-level object pose require either large amounts of human annotations CAD models or input from RGB-D sensors. In contrast we tackle the problem of learning to estimate the category-level 3D pose only from casually taken object-centric videos without human supervision. We propose a two-step pipeline: First we introduce a multi-view alignment procedure that determines canonical camera poses across videos with a novel and robust cyclic distance formulation for geometric and appearance matching using reconstructed coarse meshes and DINOv2 features. In a second step the canonical poses and reconstructed meshes enable us to train a model for 3D pose estimation from a single image. In particular our model learns to estimate dense correspondences between images and a prototypical 3D template by predicting for each pixel in a 2D image a feature vector of the corresponding vertex in the template mesh. We demonstrate that our method outperforms all baselines at the unsupervised alignment of object-centric videos by a large margin and provides faithful and robust predictions in-the-wild on the Pascal3D+ and ObjectNet3D datasets.
img_grid1
img_grid2
img_grid3
img_grid4
img_grid5
img_grid6
img_grid7
img_grid8
img_grid9
img_grid10
img_grid11
img_grid12
img_carousel1
description_carousel1 Description carousel 1
img_carousel2
description_carousel2 Description carousel 2
img_carousel3
description_carousel3 Description carousel 3
img_carousel4
description_carousel4 Description carousel 4
img_carousel5
description_carousel5 Description carousel 5
img_carousel6
description_carousel6 Description carousel 6
img_carousel7
description_carousel7 Description carousel 7
img_carousel8
description_carousel8 Description carousel 8
youtube_link
poster
bibtex <br> &#64;InProceedings&#123; Sommer&#95;2024&#95;CVPR, <br> author &#61; &#123;Sommer, Leonhard and Jesslen, Artur and Ilg, Eddy and Kortylewski, Adam&#125;, <br> title &#61 &#123;Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos&#125;, <br> booktitle &#61 &#123;Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)&#125;, <br> month &#61 &#123;June&#125;, <br> year &#61 &#123;2024&#125;, <br> pages &#61 &#123;22787-22796&#125; <br> &#125;

Leonhard Sommer1, Artur Jesslen1, Eddy Ilg2, Adam Kortylewski1,3

1University of Freibug   2Saarland University   3Max Planck Institut für Informatik
CVPR 2024