genintel.github.io/uns-obj-pose3d.md at main · GenIntel/genintel.github.io

layout	project_uop3d
title	Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos
arxiv_pdf	https://openaccess.thecvf.com/content/CVPR2024/papers/Sommer_Unsupervised_Learning_of_Category-Level_3D_Pose_from_Object-Centric_Videos_CVPR_2024_paper.pdf
supplementary_material	https://openaccess.thecvf.com/content/CVPR2024/supplemental/Sommer_Unsupervised_Learning_of_CVPR_2024_supplemental.pdf
github_link	https://github.com/GenIntel/uns-obj-pose3d.git
arxiv_link	https://openaccess.thecvf.com/content/CVPR2024/html/Sommer_Unsupervised_Learning_of_Category-Level_3D_Pose_from_Object-Centric_Videos_CVPR_2024_paper.html
teaser_video	assets/videos/uop3d/UOP3D_720p.mp4
teaser_video_description	...
abstract	Category-level 3D pose estimation is a fundamentally important problem in computer vision and robotics e.g. for embodied agents or to train 3D generative models. However so far methods that estimate the category-level object pose require either large amounts of human annotations CAD models or input from RGB-D sensors. In contrast we tackle the problem of learning to estimate the category-level 3D pose only from casually taken object-centric videos without human supervision. We propose a two-step pipeline: First we introduce a multi-view alignment procedure that determines canonical camera poses across videos with a novel and robust cyclic distance formulation for geometric and appearance matching using reconstructed coarse meshes and DINOv2 features. In a second step the canonical poses and reconstructed meshes enable us to train a model for 3D pose estimation from a single image. In particular our model learns to estimate dense correspondences between images and a prototypical 3D template by predicting for each pixel in a 2D image a feature vector of the corresponding vertex in the template mesh. We demonstrate that our method outperforms all baselines at the unsupervised alignment of object-centric videos by a large margin and provides faithful and robust predictions in-the-wild on the Pascal3D+ and ObjectNet3D datasets.
img_grid1
img_grid2
img_grid3
img_grid4
img_grid5
img_grid6
img_grid7
img_grid8
img_grid9
img_grid10
img_grid11
img_grid12
img_carousel1
description_carousel1	Description carousel 1
img_carousel2
description_carousel2	Description carousel 2
img_carousel3
description_carousel3	Description carousel 3
img_carousel4
description_carousel4	Description carousel 4
img_carousel5
description_carousel5	Description carousel 5
img_carousel6
description_carousel6	Description carousel 6
img_carousel7
description_carousel7	Description carousel 7
img_carousel8
description_carousel8	Description carousel 8
youtube_link
poster
bibtex	<br> @InProceedings{ Sommer_2024_CVPR, <br> author = {Sommer, Leonhard and Jesslen, Artur and Ilg, Eddy and Kortylewski, Adam}, <br> title &#61 {Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos}, <br> booktitle &#61 {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, <br> month &#61 {June}, <br> year &#61 {2024}, <br> pages &#61 {22787-22796} <br> }

Leonhard Sommer¹, Artur Jesslen¹, Eddy Ilg², Adam Kortylewski^1,3

¹University of Freibug ²Saarland University ³Max Planck Institut für Informatik
CVPR 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

uns-obj-pose3d.md

Latest commit

History

uns-obj-pose3d.md

File metadata and controls