DAGNet: Depth-aware Glass-like objects segmentation via cross-modal attention

被引:0
|
作者
Wan, Yingcai [1 ]
Zhao, Qiankun [1 ]
Xu, Jiqian [1 ]
Wang, Huaizhen [2 ]
Fang, Lijin [1 ]
机构
[1] Northeastern Univ, Fac Robot Sci & Engn, Shenyang, Peoples R China
[2] Inspur Grp, Inst Shandong New Generat Informat Ind Technol, Jinan, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantic segmentation; Transparent; Cross-modal; Self-attention;
D O I
10.1016/j.jvcir.2024.104121
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transparent or specular objects, such as mirrors, glass windows, and glass walls, have a significant impact on computer vision tasks. Glass -like Objects (GLOS) encompass transparent or specular objects that lack distinctive visual appearances and specific external shapes, posing challenges for GLO segmentation. In this study, we propose a novel bidirectional cross -modal fusion framework with a shift-window cross-attention for GLO segmentation. The framework incorporates a Feature Exchange Module (FEM) and a Shifted-window Cross-attention Feature Fusion Module (SW-CAFM) in each transformer block stage to calibrate, exchange, and fuse cross -modal features. The FEM employs coordinate and spatial attention mechanisms to filter out the noise and recalibrate the features from two modalities. The Shifted-Window Cross -Modal Attention Fusion (SW-CAFM) uses cross-attention to fuse RGB and depth features, leveraging the shifted-window self-attention operation to reduce the computational complexity of cross-attention. The experimental results demonstrate the feasibility and high performance of the proposed method, achieving state -of -the -art results on various glass and mirror benchmarks. The method achieves mIoU accuracies of 90.32%, 94.24%, 88.76%, and 87.47% on the GDD, Trans10K, MSD, and RGBD-Mirror datasets, respectively.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Referring Segmentation via Encoder-Fused Cross-Modal Attention Network
    Feng, Guang
    Zhang, Lihe
    Sun, Jiayu
    Hu, Zhiwei
    Lu, Huchuan
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7654 - 7667
  • [2] Auditory Attention Detection via Cross-Modal Attention
    Cai, Siqi
    Li, Peiwen
    Su, Enze
    Xie, Longhan
    [J]. FRONTIERS IN NEUROSCIENCE, 2021, 15
  • [3] Cross-Modal Prostate Cancer Segmentation via Self-Attention Distillation
    Zhang, Guokai
    Shen, Xiaoang
    Zhang, Yu-Dong
    Luo, Ye
    Luo, Jihao
    Zhu, Dandan
    Yang, Hanmei
    Wang, Weigang
    Zhao, Binghui
    Lu, Jianwei
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (11) : 5298 - 5309
  • [4] Structure-Aware Cross-Modal Transformer for Depth Completion
    Zhao, Linqing
    Wei, Yi
    Li, Jiaxin
    Zhou, Jie
    Lu, Jiwen
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1016 - 1031
  • [5] TEACH: Attention-Aware Deep Cross-Modal Hashing
    Yao, Hong-Lei
    Zhan, Yu-Wei
    Chen, Zhen-Duo
    Luo, Xin
    Xu, Xin-Shun
    [J]. PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 376 - 384
  • [6] RGB-D Grasp Detection via Depth Guided Learning with Cross-modal Attention
    Qin, Ran
    Ma, Haoxiang
    Ciao, Boyang
    Huang, Di
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 8003 - 8009
  • [7] RGB depth salient object detection via cross-modal attention and boundary feature guidance
    Meng, Lingbing
    Yuan, Mengya
    Shi, Xuehan
    Zhang, Le
    Liu, Qingqing
    Ping, Dai
    Wu, Jinhua
    Cheng, Fei
    [J]. IET COMPUTER VISION, 2024, 18 (02) : 273 - 288
  • [8] Cross-Modal Learning for Event-Based Semantic Segmentation via Attention Soft Alignment
    Xie, Chuyun
    Gao, Wei
    Guo, Ren
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (03): : 2359 - 2366
  • [9] Attention-Aware Deep Adversarial Hashing for Cross-Modal Retrieval
    Zhang, Xi
    Lai, Hanjiang
    Feng, Jiashi
    [J]. COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 614 - 629
  • [10] Cross-Modal Self-Attention Network for Referring Image Segmentation
    Ye, Linwei
    Rochan, Mrigank
    Liu, Zhi
    Wang, Yang
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10494 - 10503