A Transformer-Based Image-Guided Depth-Completion Model with Dual-Attention Fusion Module

被引：0

作者：

Wang, Shuling ^{[1
]}

Jiang, Fengze ^{[1
]}

Gong, Xiaojin ^{[1
]}

机构：

[1] The College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou,310027, China

来源：

Sensors | 2024年 / 24卷 / 19期

关键词：

Image coding - Image enhancement - RGB color model;

D O I：

10.3390/s24196270

中图分类号：

学科分类号：

摘要：

Depth information is crucial for perceiving three-dimensional scenes. However, depth maps captured directly by depth sensors are often incomplete and noisy, our objective in the depth-completion task is to generate dense and accurate depth maps from sparse depth inputs by fusing guidance information from corresponding color images obtained from camera sensors. To address these challenges, we introduce transformer models, which have shown great promise in the field of vision, into the task of image-guided depth completion. By leveraging the self-attention mechanism, we propose a novel network architecture that effectively meets these requirements of high accuracy and resolution in depth data. To be more specific, we design a dual-branch model with a transformer-based encoder that serializes image features into tokens step by step and extracts multi-scale pyramid features suitable for pixel-wise dense prediction tasks. Additionally, we incorporate a dual-attention fusion module to enhance the fusion between the two branches. This module combines convolution-based spatial and channel-attention mechanisms, which are adept at capturing local information, with cross-attention mechanisms that excel at capturing long-distance relationships. Our model achieves state-of-the-art performance on both the NYUv2 depth and SUN-RGBD depth datasets. Additionally, our ablation studies confirm the effectiveness of the designed modules. © 2024 by the authors.

引用

共 30 条

[1] Transformer-based monocular depth estimation with hybrid attention fusion and progressive regression
Zhang, Zonghua (zhzhang@hebut.edu.cn), 2025, 620
[2] OMOFuse: An Optimized Dual-Attention Mechanism Model for Infrared and Visible Image Fusion
Yuan, Jianye
Li, Song
MATHEMATICS, 2023, 11 (24)
[3] A Multi-Scale Cross-Fusion Medical Image Segmentation Network Based on Dual-Attention Mechanism Transformer
Cui, Jianguo
Wang, Liejun
Jiang, Shaochen
APPLIED SCIENCES-BASEL, 2023, 13 (19):
[4] Fusion of Image-text attention for Transformer-based Multimodal Machine Translation
Ma, Junteng
Qin, Shihao
Su, Lan
Li, Xia
Xiao, Lixian
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 199 - 204
[5] Image Segmentation of Retinal Blood Vessels Based on Dual-Attention Multiscale Feature Fusion
Gao, Jixun
Huang, Quanzhen
Gao, Zhendong
Chen, Suxia
COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2022, 2022
[6] WFormer: A Transformer-Based Soft Fusion Model for Robust Image Watermarking
Luo T.
Wu J.
He Z.
Xu H.
Jiang G.
Chang C.
IEEE Transactions on Emerging Topics in Computational Intelligence, 2024, 8 (06): : 1 - 18
[7] An effective transformer based on dual attention fusion for underwater image enhancement
Hu X.
Liu J.
Li H.
Liu H.
Xue X.
PeerJ Computer Science, 2024, 10
[8] Damage identification of frame structure based on CNN model with dual-attention mechanism and improved Inception module
Liu, Jingliang
Lu, Yulin
Zheng, Wenting
Liao, Feiyu
Chen, Zongyan
Zhendong yu Chongji/Journal of Vibration and Shock, 2024, 43 (23): : 321 - 328
[9] ATTENTION-GUIDED CONTRASTIVE MASKED IMAGE MODELING FOR TRANSFORMER-BASED SELF-SUPERVISED LEARNING
Zhan, Yucheng
Zhao, Yucheng
Luo, Chong
Zhang, Yueyi
Sun, Xiaoyan
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2490 - 2494
[10] A Transformer-based Multi-modal Joint Attention Fusion Model for Molecular Property Prediction
Wang, Ke
Zhang, Wei
Liu, Yong
Proceedings - 2023 2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023, 2023, : 4972 - 4974

← 1 2 3 →