Generic 3D Representation via Pose Estimation and Matching

被引:31
|
作者
Zamir, Amir R. [1 ]
Wekel, Tilman [1 ]
Agrawal, Pulkit [2 ]
Wei, Colin [1 ]
Malik, Jitendra [2 ]
Savarese, Silvio [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Univ Calif Berkeley, Berkeley, CA 94720 USA
来源
关键词
Generic vision; Representation; Descriptor learning; Pose estimation; Wide-baseline matching; Street view;
D O I
10.1007/978-3-319-46487-9_33
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Though a large body of computer vision research has investigated developing generic semantic representations, efforts towards developing a similar representation for 3D has been limited. In this paper, we learn a generic 3D representation through solving a set of foundational proxy 3D tasks: object-centric camera pose estimation and wide baseline feature matching. Our method is based upon the premise that by providing supervision over a set of carefully selected foundational tasks, generalization to novel tasks and abstraction capabilities can be achieved. We empirically show that the internal representation of a multi-task ConvNet trained to solve the above core problems generalizes to novel 3D tasks (e.g., scene layout estimation, object pose estimation, surface normal estimation) without the need for fine-tuning and shows traits of abstraction abilities (e.g., cross modality pose estimation). In the context of the core supervised tasks, we demonstrate our representation achieves state-of-the-art wide baseline feature matching results without requiring apriori rectification (unlike SIFT and the majority of learnt features). We also show 6DOF camera pose estimation given a pair local image patches. The accuracy of both supervised tasks come comparable to humans. Finally, we contribute a large-scale dataset composed of object-centric street view scenes along with point correspondences and camera pose information, and conclude with a discussion on the learned representation and open research questions.
引用
收藏
页码:535 / 553
页数:19
相关论文
共 50 条
  • [1] Object Pose Estimation via Viewpoint Matching of 3D Models
    Lee, Junha
    Ji, Sanghoon
    You, Sujeong
    [J]. 2021 21ST INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2021), 2021, : 1546 - 1548
  • [2] 3D Human Pose Estimation=2D Pose Estimation plus Matching
    Chen, Ching-Hang
    Ramanan, Deva
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5759 - 5767
  • [3] Invariant representation, matching and pose estimation of 3D space curves under similarity transformations
    Li, SZ
    [J]. PATTERN RECOGNITION, 1997, 30 (03) : 447 - 458
  • [4] 3D generic object categorization, localization and pose estimation
    Savarese, Silvio
    Fei-Fei, Li
    [J]. 2007 IEEE 11TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1-6, 2007, : 1245 - 1252
  • [5] Silhouette representation and matching for 3D pose discrimination - A comparative study
    Chen, Cheng
    Zhuang, Yueting
    Xiao, Jun
    [J]. IMAGE AND VISION COMPUTING, 2010, 28 (04) : 654 - 667
  • [6] 3D hand pose and mesh estimation via a generic Topology-aware Transformer model
    Yu, Shaoqi
    Wang, Yintong
    Chen, Lili
    Zhang, Xiaolin
    Li, Jiamao
    [J]. FRONTIERS IN NEUROROBOTICS, 2024, 18
  • [7] Fast template matching and pose estimation in 3D point clouds
    Vock, Richard
    Dieckmann, Alexander
    Ochmann, Sebastian
    Klein, Reinhard
    [J]. COMPUTERS & GRAPHICS-UK, 2019, 79 : 36 - 45
  • [8] Monocular 3D Pose Estimation via Pose Grammar and Data Augmentation
    Xu, Yuanlu
    Wang, Wenguan
    Liu, Tengyu
    Liu, Xiaobai
    Xie, Jianwen
    Zhu, Song-Chun
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 6327 - 6344
  • [9] A NEW SPARSE REPRESENTATION ALGORITHM FOR 3D HUMAN POSE ESTIMATION
    Andalib, Azam
    Babamir, Seyed Morteza
    Faraji, Alireza
    [J]. COMPUTING AND INFORMATICS, 2016, 35 (06) : 1338 - 1355
  • [10] Part template: 3D representation for multiview human pose estimation
    Shen, Jianfeng
    Yang, Wenming
    Liao, Qingmin
    [J]. PATTERN RECOGNITION, 2013, 46 (07) : 1920 - 1932