3D Hierarchical Refinement and Augmentation for Unsupervised Learning of Depth and Pose From Monocular Video

被引:10
|
作者
Wang, Guangming [1 ]
Zhong, Jiquan [1 ]
Zhao, Shijie [2 ]
Wu, Wenhua [1 ]
Liu, Zhe [3 ]
Wang, Hesheng [1 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai Engn Res Ctr Intelligent Control & Manage, Key Lab Syst Control & Informat Proc, Key Lab Marine Intelligent Equipment,Dept Automat,, Shanghai 200240, Peoples R China
[2] Shanghai Jiao Tong Univ, Dept Engn Mech, Shanghai 200240, Peoples R China
[3] Shanghai Jiao Tong Univ, AI Inst, MOE Key Lab Artificial Intelligence, Shanghai 200240, Peoples R China
关键词
Monocular depth estimation; visual odometry; unsupervised learning; pose refinement; 3D augmentation; VIEW SYNTHESIS; REMOVAL;
D O I
10.1109/TCSVT.2022.3215587
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Depth and ego-motion estimations are essential for the localization and navigation of autonomous robots and autonomous driving. Recent studies make it possible to learn the per-pixel depth and ego-motion from the unlabeled monocular video. In this paper, a novel unsupervised training framework is proposed with 3D hierarchical refinement and augmentation using explicit 3D geometry. In this framework, the depth and pose estimations are hierarchically and mutually coupled to refine the estimated pose layer by layer. The intermediate view image is proposed and synthesized by warping the pixels in an image with the estimated depth and coarse pose. Then, the residual pose transformation can be estimated from the new view image and the image of the adjacent frame to refine the coarse pose. The iterative refinement is implemented in a differentiable manner in this paper, making the whole framework optimized uniformly. Meanwhile, a new image augmentation method is proposed for the pose estimation by synthesizing a new view image, which creatively augments the pose in 3D space but gets a new augmented 2D image. The experiments on dKITTI demonstrate that our depth estimation achieves state-of-the-art performance and even surpasses recent approaches that utilize other auxiliary tasks. Our visual odometry outperforms all recent unsupervised monocular learning-based methods and achieves competitive performance to the geometry-based method, ORB-SLAM2 with back-end optimization. The source codes will be released soon at: https://github.com/IRMVLab/HRANet.
引用
收藏
页码:1776 / 1786
页数:11
相关论文
共 50 条
  • [1] Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints
    Mahjourian, Reza
    Wicke, Martin
    Angelova, Anelia
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5667 - 5675
  • [2] Unsupervised 3D Pose Estimation for Hierarchical Dance Video Recognition
    Hu, Xiaodan
    Ahuja, Narendra
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10995 - 11004
  • [3] Unsupervised Learning of Depth, Optical Flow and Pose With Occlusion From 3D Geometry
    Wang, Guangming
    Zhang, Chi
    Wang, Hesheng
    Wang, Jingchuan
    Wang, Yong
    Wang, Xinlei
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (01) : 308 - 320
  • [4] Monocular 3D Pose Estimation via Pose Grammar and Data Augmentation
    Xu, Yuanlu
    Wang, Wenguan
    Liu, Tengyu
    Liu, Xiaobai
    Xie, Jianwen
    Zhu, Song-Chun
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 6327 - 6344
  • [5] Unsupervised Learning of 3D Scene Flow from Monocular Camera
    Wang, Guangming
    Tian, Xiaoyu
    Ding, Ruiqi
    Wang, Hesheng
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 4325 - 4331
  • [6] RefiNet: 3D Human Pose Refinement with Depth Maps
    D'Eusanio, Andrea
    Pini, Stefano
    Borghi, Guido
    Vezzani, Roberto
    Cucchiara, Rita
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2320 - 2327
  • [7] Monocular 3D Human Pose Estimation by Predicting Depth on Joints
    Nie, Bruce Xiaohan
    Wei, Ping
    Zhu, Song-Chun
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3467 - 3475
  • [8] Deep Learning-Based Monocular 3D Object Detection with Refinement of Depth Information
    Hu, Henan
    Zhu, Ming
    Li, Muyu
    Chan, Kwok-Leung
    [J]. SENSORS, 2022, 22 (07)
  • [9] Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video
    Zhou, Xiaowei
    Zhu, Menglong
    Leonardos, Spyridon
    Derpanis, Konstantinos G.
    Daniilidis, Kostas
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4966 - 4975
  • [10] Uncertainty-Aware 3D Human Pose Estimation from Monocular Video
    Zhang, Jinlu
    Chen, Yujin
    Tu, Zhigang
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5102 - 5113