Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume

被引:0
|
作者
Han, Ming [1 ]
Yin, Hui [1 ,3 ]
Chong, Aixin [2 ]
Du, Qianqian [1 ]
机构
[1] Beijing Jiaotong Univ, Beijing Key Lab Traff Data Anal & Min, Beijing 100044, Peoples R China
[2] Beijing Jiaotong Univ, Key Lab Beijing Railway Engn, Beijing 100044, Peoples R China
[3] Beijing Jiaotong Univ, Frontiers Sci Ctr Smart High Speed Railway Syst, Beijing 100044, Peoples R China
关键词
Multi-view stereo; Cascade network; Feature pyramid; Lightweight and effective;
D O I
10.1007/s10489-024-05574-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-level features are commonly employed in the cascade network, which is currently the dominant framework in multi-view stereo (MVS). However, there is a potential issue that the recent popular multi-level feature extractor network overlooks the significance of fine-grained structure features for coarse depth inferences in MVS task. Discriminative structure features play an important part in matching and are helpful to boost the performance of depth inference. In this work, we propose an effective cascade-structured MVS model named FANet, where an enhanced feature pyramid is built with the intention of predicting reliable initial depth values. Specifically, the features from deep layers are enhanced with affluent spatial structure information in shallow layers by a bottom-up feature enhancement path. For the enhanced topmost features, an attention mechanism is additionally employed to suppress redundant information and select important features for subsequent matching. To ensure the lightweight and optimal performance of the entire model, an efficient module is built to construct a lightweight and effective cost volume, representing viewpoint correspondence reliably, by utilizing the average similarity metric to calculate feature correlations between reference view and source views and then adaptively aggregating them into a unified correlation cost volume. Extensive quantitative and qualitative comparisons on the DTU and Tanks &Temple benchmarks illustrate that the proposed model exhibits better reconstruction quality than state-of-the-art MVS methods.
引用
收藏
页码:7924 / 7940
页数:17
相关论文
共 50 条
  • [1] Cost Volume Pyramid Based Depth Inference for Multi-View Stereo
    Yang, Jiayu
    Mao, Wei
    Alvarez, Jose M.
    Liu, Miaomiao
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4876 - 4885
  • [2] Cost Volume Pyramid Based Depth Inference for Multi-View Stereo
    Yang, Jiayu
    Mao, Wei
    Alvarez, Jose
    Liu, Miaomiao
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) : 4748 - 4760
  • [3] MULTI-VIEW IMAGE FEATURE CORRELATION GUIDED COST AGGREGATION FOR MULTI-VIEW STEREO
    Lai, Yawen
    Qiu, Ke
    Wang, Ronggang
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2021,
  • [4] Adaptive depth estimation for pyramid multi-view stereo
    Liao, Jie
    Fu, Yanping
    Yan, Qingan
    Luo, Fei
    Xiao, Chunxia
    [J]. COMPUTERS & GRAPHICS-UK, 2021, 97 : 268 - 278
  • [5] Cost Volume Pyramid Network with Multi-strategies Range Searching for Multi-view Stereo
    Gao, Shiyu
    Li, Zhaoxin
    Wang, Zhaoqi
    [J]. ADVANCES IN COMPUTER GRAPHICS, CGI 2022, 2022, 13443 : 157 - 169
  • [6] Learning Inverse Depth Regression for Multi-View Stereo with Correlation Cost Volume
    Xu, Qingshan
    Tao, Wenbing
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12508 - 12515
  • [7] Attention-enhanced multi-source cost volume multi-view stereo
    Wang, Yucan
    Wang, Zhenzhen
    Tian, Hui
    Song, Yifan
    Cao, Yangjie
    Wei, Ronghan
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 132
  • [8] Pyramid Multi-View Stereo with Local Consistency
    Liao, Jie
    Fu, Yanping
    Yan, Qingan
    Xiao, Chunxia
    [J]. COMPUTER GRAPHICS FORUM, 2019, 38 (07) : 335 - 346
  • [9] Feature-enhanced representation with transformers for multi-view stereo
    Xiang, Lintao
    Yin, Hujun
    [J]. IET IMAGE PROCESSING, 2024, 18 (06) : 1530 - 1539
  • [10] Efficient Multi-view Stereo by Iterative Dynamic Cost Volume
    Wang, Shaoqian
    Li, Bo
    Dai, Yuchao
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8645 - 8654