Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth Estimation in Dynamic Scenes

被引:5
|
作者
Li, Rui [1 ]
Gong, Dong [2 ]
Yin, Wei [3 ]
Chen, Hao [4 ]
Zhu, Yu [1 ]
Wang, Kaixuan [3 ]
Chen, Xiaozhi [3 ]
Sun, Jinqiu [1 ]
Zhang, Yanning [1 ]
机构
[1] Northwestern Polytech Univ, Fremont, CA 94539 USA
[2] Univ New South Wales, Sydney, NSW, Australia
[3] DJI, Shenzhen, Peoples R China
[4] Zhejiang Univ, Hangzhou, Peoples R China
基金
澳大利亚研究理事会;
关键词
D O I
10.1109/CVPR52729.2023.02063
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-frame depth estimation generally achieves high accuracy relying on the multi-view geometric consistency. When applied in dynamic scenes, e.g., autonomous driving, this consistency is usually violated in the dynamic areas, leading to corrupted estimations. Many multi-frame methods handle dynamic areas by identifying them with explicit masks and compensating the multi-view cues with monocular cues represented as local monocular depth or features. The improvements are limited due to the uncontrolled quality of the masks and the underutilized benefits of the fusion of the two types of cues. In this paper, we propose a novel method to learn to fuse the multi-view and monocular cues encoded as volumes without needing the heuristically crafted masks. As unveiled in our analyses, the multiview cues capture more accurate geometric information in static areas, and the monocular cues capture more useful contexts in dynamic areas. To let the geometric perception learned from multi-view cues in static areas propagate to the monocular representation in dynamic areas and let monocular cues enhance the representation of multi-view cost volume, we propose a cross-cue fusion (CCF) module, which includes the cross-cue attention (CCA) to encode the spatially non-local relative intra-relations from each source to enhance the representation of the other. Experiments on real-world datasets prove the significant effectiveness and generalization ability of the proposed method.
引用
收藏
页码:21539 / 21548
页数:10
相关论文
共 50 条
  • [21] Multi-view Stereo by Fusing Monocular and a Combination of Depth Representation Methods
    Yu, Fanqi
    Sun, Xinyang
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2023, PT IV, 2024, 14450 : 298 - 309
  • [22] Adaptive depth estimation for pyramid multi-view stereo
    Liao, Jie
    Fu, Yanping
    Yan, Qingan
    Luo, Fei
    Xiao, Chunxia
    [J]. COMPUTERS & GRAPHICS-UK, 2021, 97 : 268 - 278
  • [23] A Benchmark and a Baseline for Robust Multi-view Depth Estimation
    Schroeppel, Philipp
    Bechtold, Jan
    Amiranashvili, Artemij
    Brox, Thomas
    [J]. 2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DV, 2022, : 637 - 645
  • [24] REVISED DEPTH MAP ESTIMATION FOR MULTI-VIEW STEREO
    Yao, Yao
    Zhu, Hao
    Nie, Yongming
    Ji, Xiaoli
    Cao, Xun
    [J]. 2014 INTERNATIONAL CONFERENCE ON 3D IMAGING (IC3D), 2014,
  • [25] Deep Multi-view Depth Estimation with Predicted Uncertainty
    Tong Ke
    Tien Do
    Khiem Vuong
    Sartipi, Kourosh
    Roumeliotis, Stergios, I
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 9235 - 9241
  • [26] PDE-based multi-view depth estimation
    Strecha, C
    Van Gool, L
    [J]. FIRST INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING VISUALIZATION AND TRANSMISSION, 2002, : 416 - 425
  • [27] Learning Monocular Face Reconstruction using Multi-View Supervision
    Shu, Zhixin
    Ceylan, Duygu
    Sunkavalli, Kalyan
    Shechtman, Eli
    Hadap, Sunil
    Samaras, Dimitris
    [J]. 2020 15TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2020), 2020, : 241 - 248
  • [28] Modelling dynamic scenes by registering multi-view image sequences
    Pons, JP
    Keriven, R
    Faugeras, O
    [J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 2005, : 822 - 827
  • [29] Planarity constrained multi-view depth map reconstruction for urban scenes
    Hou, Yaolin
    Peng, Jianwei
    Hu, Zhihua
    Tao, Pengjie
    Shan, Jie
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2018, 139 : 133 - 145
  • [30] Image-Based Rendering for Large-Scale Outdoor Scenes With Fusion of Monocular and Multi-View Stereo Depth
    Liu, Shaohua
    Li, Minghao
    Zhang, Xiaona
    Liu, Shuang
    Li, Zhaoxin
    Liu, Jing
    Mao, Tianlu
    [J]. IEEE ACCESS, 2020, 8 : 117551 - 117565