Edge aware depth inference for large-scale aerial building multi-view stereo

被引:0
|
作者
Zhang, Song [1 ,2 ,3 ,4 ]
Wei, Zhiwei [1 ,2 ]
Xu, Wenjia [5 ]
Zhang, Lili [1 ,2 ]
Wang, Yang [1 ,2 ]
Zhang, Jinming [1 ,2 ]
Liu, Junyi [1 ,2 ]
机构
[1] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Inst Elect, Key Lab Network Informat Syst Technol NIST, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci, Beijing 100190, Peoples R China
[4] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 100190, Peoples R China
[5] Beijing Univ Posts & Telecommun, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-view stereo; Aerial images; Building depth estimation; Building edge extraction; Building edge incorporation; NETWORK;
D O I
10.1016/j.isprsjprs.2023.11.020
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
Aerial building depth estimation is a crucial task in 3D digital urban reconstruction and learning-based multi view stereo (MVS) methods have recently shown promising results in this field. However, these methods are mainly developed by modifying the general learning-based MVS framework for aerial depth estimation, which lack consideration about the intrinsic structures of buildings and result in insufficient accuracy. Therefore, we propose an end-to-end edge aware depth inference network for large-scale aerial building multi-views stereo, called EG-MVSNet, which incorporates the building edge information and jointly estimate the depth map and edge map. Firstly, we propose a novel Edge-Sensitive Network based on the differentiable Dynamic Sobel Kernels to obtain reliable building edge features while eliminating other irrelevant features. We further propose an UNet-like Edge Prediction Branch and a Building Edge-Depth Loss to constrain the model focus primarily on the building edge features. Notably, the pseudo ground truth (GT) edge map for each aerial image is obtained with classical gradient operators which do not require additional annotation. Secondly, to incorporate the edge features into the depth prediction module, we introduce an Inter-volume Adaptive Fusion Module that adaptively incorporates the edge features volume into a standard cost volume and guides the regularization of the cost volume. An Edge Depth Refinement Module is further proposed to performs 2D guidance refinement and avoid over-smoothed or blurred depth boundaries. Extensive experiments on the WHU dataset and LuoJia-MVS dataset show that our model significantly outperforms state-of-the-art performance by more than 22% mean absolute error (MAE) compared to RED-Net and 57% MAE compared to MVSNet. Additionally, to validate our proposed model, we reconstruct a synthetic aerial building benchmark based on WHU dataset. The results as far as correctness and accuracy exceeded the results of other MVS methods in a between-method comparison by at least 12% in MAE metric. The dataset and code can be available at https://github.com/zs670980918/EG-MVSNet.
引用
收藏
页码:27 / 42
页数:16
相关论文
共 50 条
  • [1] Multi-view stereo for large-scale scene reconstruction with MRF-based depth inference
    Sun, Shang
    Xu, Dan
    Wu, Hao
    Ying, Haocong
    Mou, Yurui
    [J]. COMPUTERS & GRAPHICS-UK, 2022, 106 : 248 - 258
  • [2] MVSNet: Depth Inference for Unstructured Multi-view Stereo
    Yao, Yao
    Luo, Zixin
    Li, Shiwei
    Fang, Tian
    Quan, Long
    [J]. COMPUTER VISION - ECCV 2018, PT VIII, 2018, 11212 : 785 - 801
  • [3] Recurrent Multi-view Stereo Depth Inference with Pyramid of Images
    Wang, Xiaobao
    Dong, Enzeng
    Tong, Jigang
    Sun, Zhe
    Li, Wenyu
    Duan, Feng
    [J]. PROCEEDINGS OF 2022 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION (IEEE ICMA 2022), 2022, : 259 - 263
  • [4] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Networks
    Yao, Yao
    Luo, Zixin
    Li, Shiwei
    Zhang, Jingyang
    Ren, Yufan
    Zhou, Lei
    Fang, Tian
    Quan, Long
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 1787 - 1796
  • [5] Towards high-resolution large-scale multi-view stereo
    Hiep, Vu Hoang
    Keriven, Renaud
    Labatut, Patrick
    Pons, Jean-Philippe
    [J]. CVPR: 2009 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-4, 2009, : 1430 - 1437
  • [6] Confidence-Based Large-Scale Dense Multi-View Stereo
    Li, Zhaoxin
    Zuo, Wangmeng
    Wang, Zhaoqi
    Zhang, Lei
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 7176 - 7191
  • [7] Self-supervised Learning of Depth Inference for Multi-view Stereo
    Yang, Jiayu
    Alvarez, Jose M.
    Liu, Miaomiao
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7522 - 7530
  • [8] Image-Based Rendering for Large-Scale Outdoor Scenes With Fusion of Monocular and Multi-View Stereo Depth
    Liu, Shaohua
    Li, Minghao
    Zhang, Xiaona
    Liu, Shuang
    Li, Zhaoxin
    Liu, Jing
    Mao, Tianlu
    [J]. IEEE ACCESS, 2020, 8 (08): : 117551 - 117565
  • [9] Cost Volume Pyramid Based Depth Inference for Multi-View Stereo
    Yang, Jiayu
    Mao, Wei
    Alvarez, Jose
    Liu, Miaomiao
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) : 4748 - 4760
  • [10] Cost Volume Pyramid Based Depth Inference for Multi-View Stereo
    Yang, Jiayu
    Mao, Wei
    Alvarez, Jose M.
    Liu, Miaomiao
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4876 - 4885