OnionNet: Single-View Depth Prediction and Camera Pose Estimation for Unlabeled Video

被引：4

作者：

Gu, Tianhao ^{[1
,2
]}

Wang, Zhe ^{[1
,2
]}

Li, Dongdong ^{[2
]}

Yang, Hai ^{[2
]}

Du, Wenli ^{[1
]}

Zhou, Yangming ^{[2
]}

机构：

[1] East China Univ Sci & Technol, Key Lab Adv Control & Optimizat Chem Proc, Minist Educ, Shanghai 200237, Peoples R China

[2] East China Univ Sci & Technol, Dept Comp Sci & Engn, Shanghai 200237, Peoples R China

来源：

IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS | 2021年 / 13卷 / 04期

基金：

美国国家科学基金会;

关键词：

Cameras; Training; Pose estimation; Geometry; Robustness; Task analysis; Decoding; Camera pose estimation; multitask learning; single-view depth prediction; unsupervised learning; LOCALIZATION; SLAM;

D O I：

10.1109/TCDS.2020.3042521

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In real scenes, humans can easily infer their positions and distances from other objects with their own eyes. To make the robots have the same visual ability, this article presents an unsupervised OnionNet framework, including LeafNet and ParachuteNet, for single-view depth prediction and camera pose estimation. In OnionNet, for speeding up OnionNet's convergence and concretizing objects against the gradient locality and moving objects in videos, LeafNet adopts two decoders and enhanced upconvolution modules. Meanwhile, for improving the robustness of fast camera movement and rotation, ParachuteNet uses and integrates three pose networks to estimate multiview camera pose parameters by combining with the modified image preprocess. Different from existing methods, single-view depth prediction and camera pose estimation are trained view by view, where the variations between views is gradual reduction of view range and outer pixels disappear in next view, similar to onion peeling. Moreover, the LeafNet is optimized with pose parameter from each pose network in turn. Experimental results on the KITTI data set show the outstanding effectiveness of our method: single-view depth performs better than most supervised and unsupervised methods which contain two same subtasks, and pose estimation gets the state-of-the-art performance compared with existing methods under the comparable input settings.

引用

页码：995 / 1009

页数：15

共 50 条

[1] Human Body Pose Recognition from a Single-View Depth Camera
Huang, Po-Chi
Jeng, Shyh-Kang
PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 2144 - 2149
[2] Unsupervised cycle optimization learning for single-view depth and camera pose with Kalman filter
Gu, Tianhao
Wang, Zhe
Chi, Ziqiu
Zhu, Yiwen
Du, Wenli
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 106
[3] Robust 3D Hand Pose Estimation in Single Depth Images: from Single-View CNN to Multi-View CNNs
Ge, Liuhao
Liang, Hui
Yuan, Junsong
Thalmann, Daniel
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3593 - 3601
[4] Motion capture and human pose reconstruction from a single-view video sequence
Gudukbay, Ugur
Demir, Ibrahim
Dedeoglu, Yigithan
DIGITAL SIGNAL PROCESSING, 2013, 23 (05) : 1441 - 1450
[5] Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry
Bae, Gwangbin
Budvytis, Ignas
Cipolla, Roberto
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2832 - 2841
[6] PoET: Pose Estimation Transformer for Single-View, Multi-Object 6D Pose Estimation
Jantos, Thomas
Hamdad, Mohamed Amin
Granig, Wolfgang
Weiss, Stephan
Steinbrener, Jan
CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 1060 - 1070
[7] Single-view robot pose and joint angle estimation via render & compare
Labbe, Yann
Carpentier, Justin
Aubry, Mathieu
Sivic, Josef
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1654 - 1663
[8] MegaDepth: Learning Single-View Depth Prediction from Internet Photos
Li, Zhengqi
Snavely, Noah
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 2041 - 2050
[9] 3D mouse pose from single-view video and a new dataset
Hu, Bo
Seybold, Bryan
Yang, Shan
Sud, Avneesh
Liu, Yi
Barron, Karla
Cha, Paulyn
Cosino, Marcelo
Karlsson, Ellie
Kite, Janessa
Kolumam, Ganesh
Preciado, Joseph
Zavala-Solorio, Jose
Zhang, Chunlian
Zhang, Xiaomeng
Voorbach, Martin
Tovcimak, Ann E.
Ruby, J. Graham
Ross, David A.
SCIENTIFIC REPORTS, 2023, 13 (01)
[10] 3D mouse pose from single-view video and a new dataset
Bo Hu
Bryan Seybold
Shan Yang
Avneesh Sud
Yi Liu
Karla Barron
Paulyn Cha
Marcelo Cosino
Ellie Karlsson
Janessa Kite
Ganesh Kolumam
Joseph Preciado
José Zavala-Solorio
Chunlian Zhang
Xiaomeng Zhang
Martin Voorbach
Ann E. Tovcimak
J. Graham Ruby
David A. Ross
Scientific Reports, 13

← 1 2 3 4 5 →