Generic 3D Representation via Pose Estimation and Matching

被引：31

作者：

Zamir, Amir R. ^{[1
]}

Wekel, Tilman ^{[1
]}

Agrawal, Pulkit ^{[2
]}

Wei, Colin ^{[1
]}

Malik, Jitendra ^{[2
]}

Savarese, Silvio ^{[1
]}

机构：

[1] Stanford Univ, Stanford, CA 94305 USA

[2] Univ Calif Berkeley, Berkeley, CA 94720 USA

来源：

COMPUTER VISION - ECCV 2016, PT III | 2016年 / 9907卷

关键词：

Generic vision; Representation; Descriptor learning; Pose estimation; Wide-baseline matching; Street view;

D O I：

10.1007/978-3-319-46487-9_33

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Though a large body of computer vision research has investigated developing generic semantic representations, efforts towards developing a similar representation for 3D has been limited. In this paper, we learn a generic 3D representation through solving a set of foundational proxy 3D tasks: object-centric camera pose estimation and wide baseline feature matching. Our method is based upon the premise that by providing supervision over a set of carefully selected foundational tasks, generalization to novel tasks and abstraction capabilities can be achieved. We empirically show that the internal representation of a multi-task ConvNet trained to solve the above core problems generalizes to novel 3D tasks (e.g., scene layout estimation, object pose estimation, surface normal estimation) without the need for fine-tuning and shows traits of abstraction abilities (e.g., cross modality pose estimation). In the context of the core supervised tasks, we demonstrate our representation achieves state-of-the-art wide baseline feature matching results without requiring apriori rectification (unlike SIFT and the majority of learnt features). We also show 6DOF camera pose estimation given a pair local image patches. The accuracy of both supervised tasks come comparable to humans. Finally, we contribute a large-scale dataset composed of object-centric street view scenes along with point correspondences and camera pose information, and conclude with a discussion on the learned representation and open research questions.

引用

页码：535 / 553

页数：19

共 50 条

[1] Object Pose Estimation via Viewpoint Matching of 3D Models
Lee, Junha
Ji, Sanghoon
You, Sujeong
[J]. 2021 21ST INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2021), 2021, : 1546 - 1548
[2] 3D Human Pose Estimation=2D Pose Estimation plus Matching
Chen, Ching-Hang
Ramanan, Deva
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5759 - 5767
[3] Invariant representation, matching and pose estimation of 3D space curves under similarity transformations
Li, SZ
[J]. PATTERN RECOGNITION, 1997, 30 (03) : 447 - 458
[4] 3D generic object categorization, localization and pose estimation
Savarese, Silvio
Fei-Fei, Li
[J]. 2007 IEEE 11TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1-6, 2007, : 1245 - 1252
[5] Silhouette representation and matching for 3D pose discrimination - A comparative study
Chen, Cheng
Zhuang, Yueting
Xiao, Jun
[J]. IMAGE AND VISION COMPUTING, 2010, 28 (04) : 654 - 667
[6] 3D hand pose and mesh estimation via a generic Topology-aware Transformer model
Yu, Shaoqi
Wang, Yintong
Chen, Lili
Zhang, Xiaolin
Li, Jiamao
[J]. FRONTIERS IN NEUROROBOTICS, 2024, 18
[7] Fast template matching and pose estimation in 3D point clouds
Vock, Richard
Dieckmann, Alexander
Ochmann, Sebastian
Klein, Reinhard
[J]. COMPUTERS & GRAPHICS-UK, 2019, 79 : 36 - 45
[8] Monocular 3D Pose Estimation via Pose Grammar and Data Augmentation
Xu, Yuanlu
Wang, Wenguan
Liu, Tengyu
Liu, Xiaobai
Xie, Jianwen
Zhu, Song-Chun
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 6327 - 6344
[9] A NEW SPARSE REPRESENTATION ALGORITHM FOR 3D HUMAN POSE ESTIMATION
Andalib, Azam
Babamir, Seyed Morteza
Faraji, Alireza
[J]. COMPUTING AND INFORMATICS, 2016, 35 (06) : 1338 - 1355
[10] Part template: 3D representation for multiview human pose estimation
Shen, Jianfeng
Yang, Wenming
Liao, Qingmin
[J]. PATTERN RECOGNITION, 2013, 46 (07) : 1920 - 1932

← 1 2 3 4 5 →