A Transformer-Based Image-Guided Depth-Completion Model with Dual-Attention Fusion Module

被引:0
|
作者
Wang, Shuling [1 ]
Jiang, Fengze [1 ]
Gong, Xiaojin [1 ]
机构
[1] The College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou,310027, China
关键词
Image coding - Image enhancement - RGB color model;
D O I
10.3390/s24196270
中图分类号
学科分类号
摘要
Depth information is crucial for perceiving three-dimensional scenes. However, depth maps captured directly by depth sensors are often incomplete and noisy, our objective in the depth-completion task is to generate dense and accurate depth maps from sparse depth inputs by fusing guidance information from corresponding color images obtained from camera sensors. To address these challenges, we introduce transformer models, which have shown great promise in the field of vision, into the task of image-guided depth completion. By leveraging the self-attention mechanism, we propose a novel network architecture that effectively meets these requirements of high accuracy and resolution in depth data. To be more specific, we design a dual-branch model with a transformer-based encoder that serializes image features into tokens step by step and extracts multi-scale pyramid features suitable for pixel-wise dense prediction tasks. Additionally, we incorporate a dual-attention fusion module to enhance the fusion between the two branches. This module combines convolution-based spatial and channel-attention mechanisms, which are adept at capturing local information, with cross-attention mechanisms that excel at capturing long-distance relationships. Our model achieves state-of-the-art performance on both the NYUv2 depth and SUN-RGBD depth datasets. Additionally, our ablation studies confirm the effectiveness of the designed modules. © 2024 by the authors.
引用
收藏
相关论文
共 30 条
  • [2] OMOFuse: An Optimized Dual-Attention Mechanism Model for Infrared and Visible Image Fusion
    Yuan, Jianye
    Li, Song
    MATHEMATICS, 2023, 11 (24)
  • [3] A Multi-Scale Cross-Fusion Medical Image Segmentation Network Based on Dual-Attention Mechanism Transformer
    Cui, Jianguo
    Wang, Liejun
    Jiang, Shaochen
    APPLIED SCIENCES-BASEL, 2023, 13 (19):
  • [4] Fusion of Image-text attention for Transformer-based Multimodal Machine Translation
    Ma, Junteng
    Qin, Shihao
    Su, Lan
    Li, Xia
    Xiao, Lixian
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 199 - 204
  • [5] Image Segmentation of Retinal Blood Vessels Based on Dual-Attention Multiscale Feature Fusion
    Gao, Jixun
    Huang, Quanzhen
    Gao, Zhendong
    Chen, Suxia
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2022, 2022
  • [6] WFormer: A Transformer-Based Soft Fusion Model for Robust Image Watermarking
    Luo T.
    Wu J.
    He Z.
    Xu H.
    Jiang G.
    Chang C.
    IEEE Transactions on Emerging Topics in Computational Intelligence, 2024, 8 (06): : 1 - 18
  • [7] An effective transformer based on dual attention fusion for underwater image enhancement
    Hu X.
    Liu J.
    Li H.
    Liu H.
    Xue X.
    PeerJ Computer Science, 2024, 10
  • [8] Damage identification of frame structure based on CNN model with dual-attention mechanism and improved Inception module
    Liu, Jingliang
    Lu, Yulin
    Zheng, Wenting
    Liao, Feiyu
    Chen, Zongyan
    Zhendong yu Chongji/Journal of Vibration and Shock, 2024, 43 (23): : 321 - 328
  • [9] ATTENTION-GUIDED CONTRASTIVE MASKED IMAGE MODELING FOR TRANSFORMER-BASED SELF-SUPERVISED LEARNING
    Zhan, Yucheng
    Zhao, Yucheng
    Luo, Chong
    Zhang, Yueyi
    Sun, Xiaoyan
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2490 - 2494
  • [10] A Transformer-based Multi-modal Joint Attention Fusion Model for Molecular Property Prediction
    Wang, Ke
    Zhang, Wei
    Liu, Yong
    Proceedings - 2023 2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023, 2023, : 4972 - 4974