On Robust Cross-view Consistency in Self-supervised Monocular Depth Estimation

被引:0
|
作者
Haimei Zhao
Jing Zhang
Zhuo Chen
Bo Yuan
Dacheng Tao
机构
[1] University of Sydney,School of Computer Science
[2] Tsinghua University,Shenzhen International Graduate School
[3] University of Queensland,School of Information Technology & Electrical Engineering
来源
关键词
3D vision; depth estimation; cross-view consistency; self-supervised learning; monocular perception;
D O I
暂无
中图分类号
学科分类号
摘要
Remarkable progress has been made in self-supervised monocular depth estimation (SS-MDE) by exploring cross-view consistency, e.g., photometric consistency and 3D point cloud consistency. However, they are very vulnerable to illumination variance, occlusions, texture-less regions, as well as moving objects, making them not robust enough to deal with various scenes. To address this challenge, we study two kinds of robust cross-view consistency in this paper. Firstly, the spatial offset field between adjacent frames is obtained by reconstructing the reference frame from its neighbors via deformable alignment, which is used to align the temporal depth features via a depth feature alignment (DFA) loss. Secondly, the 3D point clouds of each reference frame and its nearby frames are calculated and transformed into voxel space, where the point density in each voxel is calculated and aligned via a voxel density alignment (VDA) loss. In this way, we exploit the temporal coherence in both depth feature space and 3D voxel space for SS-MDE, shifting the “point-to-point” alignment paradigm to the “region-to-region” one. Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges. Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques. Extensive ablation study and analysis validate the effectiveness of the proposed losses, especially in challenging scenes. The code and models are available at https://github.com/sunnyHelen/RCVC-depth.
引用
下载
收藏
页码:495 / 513
页数:18
相关论文
共 50 条
  • [21] Self-Supervised Monocular Depth Estimation With Multiscale Perception
    Zhang, Yourun
    Gong, Maoguo
    Li, Jianzhao
    Zhang, Mingyang
    Jiang, Fenlong
    Zhao, Hongyu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3251 - 3266
  • [22] Self-Supervised Monocular Depth Estimation With Multiscale Perception
    Zhang, Yourun
    Gong, Maoguo
    Li, Jianzhao
    Zhang, Mingyang
    Jiang, Fenlong
    Zhao, Hongyu
    IEEE Transactions on Image Processing, 2022, 31 : 3251 - 3266
  • [23] Self-Supervised Monocular Depth Estimation With Extensive Pretraining
    Choi, Hyukdoo
    IEEE ACCESS, 2021, 9 : 157236 - 157246
  • [24] Self-Supervised Monocular Depth Estimation with Extensive Pretraining
    Choi, Hyukdoo
    IEEE Access, 2021, 9 : 157236 - 157246
  • [25] Self-supervised Depth Estimation from Spectral Consistency and Novel View Synthesis
    Lu, Yawen
    Lu, Guoyu
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [26] Enhanced blur-robust monocular depth estimation via self-supervised learning
    Sung, Chi-Hun
    Kim, Seong-Yeol
    Shin, Ho-Ju
    Lee, Se-Ho
    Kim, Seung-Wook
    Electronics Letters, 2024, 60 (22)
  • [27] An Efficient Self-Supervised Cross-View Training For Sentence Embedding
    Limkonchotiwat, Peerat
    Ponwitayarat, Wuttikorn
    Lowphansirikul, Lalita
    Udomcharoenchaikit, Can
    Chuangsuwanich, Ekapol
    Nutanong, Sarana
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1572 - 1587
  • [28] Learning Where to Learn in Cross-View Self-Supervised Learning
    Huang, Lang
    You, Shan
    Zheng, Mingkai
    Wang, Fei
    Qian, Chen
    Yamasaki, Toshihiko
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14431 - 14440
  • [29] Self-supervised Cross-view Representation Reconstruction for Change Captioning
    Tu, Yunbin
    Li, Liang
    Su, Li
    Zha, Zheng-Jun
    Yan, Chenggang
    Huang, Qingming
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2793 - 2803
  • [30] Monocular Depth Estimation via Self-Supervised Self-Distillation
    Hu, Haifeng
    Feng, Yuyang
    Li, Dapeng
    Zhang, Suofei
    Zhao, Haitao
    SENSORS, 2024, 24 (13)