Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth Estimation in Dynamic Scenes

被引:5
|
作者
Li, Rui [1 ]
Gong, Dong [2 ]
Yin, Wei [3 ]
Chen, Hao [4 ]
Zhu, Yu [1 ]
Wang, Kaixuan [3 ]
Chen, Xiaozhi [3 ]
Sun, Jinqiu [1 ]
Zhang, Yanning [1 ]
机构
[1] Northwestern Polytech Univ, Fremont, CA 94539 USA
[2] Univ New South Wales, Sydney, NSW, Australia
[3] DJI, Shenzhen, Peoples R China
[4] Zhejiang Univ, Hangzhou, Peoples R China
基金
澳大利亚研究理事会;
关键词
D O I
10.1109/CVPR52729.2023.02063
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-frame depth estimation generally achieves high accuracy relying on the multi-view geometric consistency. When applied in dynamic scenes, e.g., autonomous driving, this consistency is usually violated in the dynamic areas, leading to corrupted estimations. Many multi-frame methods handle dynamic areas by identifying them with explicit masks and compensating the multi-view cues with monocular cues represented as local monocular depth or features. The improvements are limited due to the uncontrolled quality of the masks and the underutilized benefits of the fusion of the two types of cues. In this paper, we propose a novel method to learn to fuse the multi-view and monocular cues encoded as volumes without needing the heuristically crafted masks. As unveiled in our analyses, the multiview cues capture more accurate geometric information in static areas, and the monocular cues capture more useful contexts in dynamic areas. To let the geometric perception learned from multi-view cues in static areas propagate to the monocular representation in dynamic areas and let monocular cues enhance the representation of multi-view cost volume, we propose a cross-cue fusion (CCF) module, which includes the cross-cue attention (CCA) to encode the spatially non-local relative intra-relations from each source to enhance the representation of the other. Experiments on real-world datasets prove the significant effectiveness and generalization ability of the proposed method.
引用
收藏
页码:21539 / 21548
页数:10
相关论文
共 50 条
  • [41] Rethinking Depth Estimation for Multi-View Stereo: A Unified Representation
    Peng, Rui
    Wang, Rongjie
    Wang, Zhenyu
    Lai, Yawen
    Wang, Ronggang
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8635 - 8644
  • [42] Multi-view video coding via dense depth estimation
    Oezkalayci, Burak
    Gedik, O. Serdar
    Alatan, A. Aydin
    [J]. 2007 3DTV CONFERENCE, 2007, : 310 - 313
  • [43] Uncertainty Guided Multi-View Stereo Network for Depth Estimation
    Su, Wanjuan
    Xu, Qingshan
    Tao, Wenbing
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (11) : 7796 - 7808
  • [44] MODE: Multi-view Omnidirectional Depth Estimation with 360° Cameras
    Li, Ming
    Jin, Xueqian
    Hu, Xuejiao
    Dai, Jingzhao
    Du, Sidan
    Li, Yang
    [J]. COMPUTER VISION - ECCV 2022, PT XXXIII, 2022, 13693 : 197 - 213
  • [45] FADE: Feature Aggregation for Depth Estimation With Multi-View Stereo
    Yang, Hsiao-Chien
    Chen, Po-Heng
    Chen, Kuan-Wen
    Lee, Chen-Yi
    Chen, Yong-Sheng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 6590 - 6600
  • [46] Depth Estimation in Multi-View Stereo Based on Image Pyramid
    Xu, Hanfei
    Cai, Yangang
    Wang, Ronggang
    [J]. PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018), 2018, : 345 - 349
  • [47] A semi-automatic multi-view depth estimation method
    Wildeboer, Meindert Onno
    Fukushima, Norishige
    Yendo, Tomohiro
    Tehrani, Mehrdad Panahpour
    Fujii, Toshiaki
    Tanimoto, Masayuki
    [J]. VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2010, 2010, 7744
  • [48] Self-Supervised Multi-Frame Monocular Scene Flow
    Hur, Junhwa
    Roth, Stefan
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2683 - 2693
  • [49] Fast multi-view disparity estimation for multi-view video systems
    Jiang, Gangyi
    Yu, Mei
    Shao, Feng
    Yang, You
    Dong, Haitao
    [J]. ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS, PROCEEDINGS, 2006, 4179 : 493 - 500
  • [50] Monocular multi-view stereo imaging system
    Jiang, W.
    Shimizu, M.
    Okutomi, M.
    [J]. JOURNAL OF THE EUROPEAN OPTICAL SOCIETY-RAPID PUBLICATIONS, 2011, 6 : 10