OnionNet: Single-View Depth Prediction and Camera Pose Estimation for Unlabeled Video

被引:4
|
作者
Gu, Tianhao [1 ,2 ]
Wang, Zhe [1 ,2 ]
Li, Dongdong [2 ]
Yang, Hai [2 ]
Du, Wenli [1 ]
Zhou, Yangming [2 ]
机构
[1] East China Univ Sci & Technol, Key Lab Adv Control & Optimizat Chem Proc, Minist Educ, Shanghai 200237, Peoples R China
[2] East China Univ Sci & Technol, Dept Comp Sci & Engn, Shanghai 200237, Peoples R China
基金
美国国家科学基金会;
关键词
Cameras; Training; Pose estimation; Geometry; Robustness; Task analysis; Decoding; Camera pose estimation; multitask learning; single-view depth prediction; unsupervised learning; LOCALIZATION; SLAM;
D O I
10.1109/TCDS.2020.3042521
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In real scenes, humans can easily infer their positions and distances from other objects with their own eyes. To make the robots have the same visual ability, this article presents an unsupervised OnionNet framework, including LeafNet and ParachuteNet, for single-view depth prediction and camera pose estimation. In OnionNet, for speeding up OnionNet's convergence and concretizing objects against the gradient locality and moving objects in videos, LeafNet adopts two decoders and enhanced upconvolution modules. Meanwhile, for improving the robustness of fast camera movement and rotation, ParachuteNet uses and integrates three pose networks to estimate multiview camera pose parameters by combining with the modified image preprocess. Different from existing methods, single-view depth prediction and camera pose estimation are trained view by view, where the variations between views is gradual reduction of view range and outer pixels disappear in next view, similar to onion peeling. Moreover, the LeafNet is optimized with pose parameter from each pose network in turn. Experimental results on the KITTI data set show the outstanding effectiveness of our method: single-view depth performs better than most supervised and unsupervised methods which contain two same subtasks, and pose estimation gets the state-of-the-art performance compared with existing methods under the comparable input settings.
引用
收藏
页码:995 / 1009
页数:15
相关论文
共 50 条
  • [1] Human Body Pose Recognition from a Single-View Depth Camera
    Huang, Po-Chi
    Jeng, Shyh-Kang
    PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 2144 - 2149
  • [2] Unsupervised cycle optimization learning for single-view depth and camera pose with Kalman filter
    Gu, Tianhao
    Wang, Zhe
    Chi, Ziqiu
    Zhu, Yiwen
    Du, Wenli
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 106
  • [3] Robust 3D Hand Pose Estimation in Single Depth Images: from Single-View CNN to Multi-View CNNs
    Ge, Liuhao
    Liang, Hui
    Yuan, Junsong
    Thalmann, Daniel
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3593 - 3601
  • [4] Motion capture and human pose reconstruction from a single-view video sequence
    Gudukbay, Ugur
    Demir, Ibrahim
    Dedeoglu, Yigithan
    DIGITAL SIGNAL PROCESSING, 2013, 23 (05) : 1441 - 1450
  • [5] Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry
    Bae, Gwangbin
    Budvytis, Ignas
    Cipolla, Roberto
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2832 - 2841
  • [6] PoET: Pose Estimation Transformer for Single-View, Multi-Object 6D Pose Estimation
    Jantos, Thomas
    Hamdad, Mohamed Amin
    Granig, Wolfgang
    Weiss, Stephan
    Steinbrener, Jan
    CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 1060 - 1070
  • [7] Single-view robot pose and joint angle estimation via render & compare
    Labbe, Yann
    Carpentier, Justin
    Aubry, Mathieu
    Sivic, Josef
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1654 - 1663
  • [8] MegaDepth: Learning Single-View Depth Prediction from Internet Photos
    Li, Zhengqi
    Snavely, Noah
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 2041 - 2050
  • [9] 3D mouse pose from single-view video and a new dataset
    Hu, Bo
    Seybold, Bryan
    Yang, Shan
    Sud, Avneesh
    Liu, Yi
    Barron, Karla
    Cha, Paulyn
    Cosino, Marcelo
    Karlsson, Ellie
    Kite, Janessa
    Kolumam, Ganesh
    Preciado, Joseph
    Zavala-Solorio, Jose
    Zhang, Chunlian
    Zhang, Xiaomeng
    Voorbach, Martin
    Tovcimak, Ann E.
    Ruby, J. Graham
    Ross, David A.
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [10] 3D mouse pose from single-view video and a new dataset
    Bo Hu
    Bryan Seybold
    Shan Yang
    Avneesh Sud
    Yi Liu
    Karla Barron
    Paulyn Cha
    Marcelo Cosino
    Ellie Karlsson
    Janessa Kite
    Ganesh Kolumam
    Joseph Preciado
    José Zavala-Solorio
    Chunlian Zhang
    Xiaomeng Zhang
    Martin Voorbach
    Ann E. Tovcimak
    J. Graham Ruby
    David A. Ross
    Scientific Reports, 13