On Robust Cross-view Consistency in Self-supervised Monocular Depth Estimation

被引:1
|
作者
Zhao, Haimei [1 ]
Zhang, Jing [1 ]
Chen, Zhuo [2 ]
Yuan, Bo [3 ]
Tao, Dacheng [1 ]
机构
[1] Univ Sydney, Sch Comp Sci, Sydney, NSW 2008, Australia
[2] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518055, Peoples R China
[3] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane 4072, Australia
基金
澳大利亚研究理事会;
关键词
3D vision; depth estimation; cross-view consistency; self-supervised learning; monocular perception;
D O I
10.1007/s11633-023-1474-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Remarkable progress has been made in self-supervised monocular depth estimation (SS-MDE) by exploring cross-view consistency, e.g., photometric consistency and 3D point cloud consistency. However, they are very vulnerable to illumination variance, occlusions, texture-less regions, as well as moving objects, making them not robust enough to deal with various scenes. To address this challenge, we study two kinds of robust cross-view consistency in this paper. Firstly, the spatial offset field between adjacent frames is obtained by reconstructing the reference frame from its neighbors via deformable alignment, which is used to align the temporal depth features via a depth feature alignment (DFA) loss. Secondly, the 3D point clouds of each reference frame and its nearby frames are calculated and transformed into voxel space, where the point density in each voxel is calculated and aligned via a voxel density alignment (VDA) loss. In this way, we exploit the temporal coherence in both depth feature space and 3D voxel space for SS-MDE, shifting the "point-to-point" alignment paradigm to the "region-to-region" one. Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges. Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques. Extensive ablation study and analysis validate the effectiveness of the proposed losses, especially in challenging scenes. The code and models are available at https://github.com/sunnyHelen/RCVC-depth.
引用
下载
收藏
页码:495 / 513
页数:19
相关论文
共 50 条
  • [21] Self-Supervised Monocular Depth Estimation With Multiscale Perception
    Zhang, Yourun
    Gong, Maoguo
    Li, Jianzhao
    Zhang, Mingyang
    Jiang, Fenlong
    Zhao, Hongyu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3251 - 3266
  • [22] Self-Supervised Monocular Depth Estimation With Multiscale Perception
    Zhang, Yourun
    Gong, Maoguo
    Li, Jianzhao
    Zhang, Mingyang
    Jiang, Fenlong
    Zhao, Hongyu
    IEEE Transactions on Image Processing, 2022, 31 : 3251 - 3266
  • [23] Self-Supervised Monocular Depth Estimation With Extensive Pretraining
    Choi, Hyukdoo
    IEEE ACCESS, 2021, 9 : 157236 - 157246
  • [24] Self-Supervised Monocular Depth Estimation with Extensive Pretraining
    Choi, Hyukdoo
    IEEE Access, 2021, 9 : 157236 - 157246
  • [25] Self-supervised Depth Estimation from Spectral Consistency and Novel View Synthesis
    Lu, Yawen
    Lu, Guoyu
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [26] Enhanced blur-robust monocular depth estimation via self-supervised learning
    Sung, Chi-Hun
    Kim, Seong-Yeol
    Shin, Ho-Ju
    Lee, Se-Ho
    Kim, Seung-Wook
    Electronics Letters, 2024, 60 (22)
  • [27] An Efficient Self-Supervised Cross-View Training For Sentence Embedding
    Limkonchotiwat, Peerat
    Ponwitayarat, Wuttikorn
    Lowphansirikul, Lalita
    Udomcharoenchaikit, Can
    Chuangsuwanich, Ekapol
    Nutanong, Sarana
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1572 - 1587
  • [28] Learning Where to Learn in Cross-View Self-Supervised Learning
    Huang, Lang
    You, Shan
    Zheng, Mingkai
    Wang, Fei
    Qian, Chen
    Yamasaki, Toshihiko
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14431 - 14440
  • [29] Self-supervised Cross-view Representation Reconstruction for Change Captioning
    Tu, Yunbin
    Li, Liang
    Su, Li
    Zha, Zheng-Jun
    Yan, Chenggang
    Huang, Qingming
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2793 - 2803
  • [30] Monocular Depth Estimation via Self-Supervised Self-Distillation
    Hu, Haifeng
    Feng, Yuyang
    Li, Dapeng
    Zhang, Suofei
    Zhao, Haitao
    SENSORS, 2024, 24 (13)