MBIAN: Multi-level bilateral interactive attention network for multi-modal

被引:1
|
作者
Sun, Kai [1 ]
Zhang, Jiangshe [1 ]
Wang, Jialin [2 ]
Xu, Shuang [3 ,4 ]
Zhang, Chunxia [1 ]
Hu, Junying [5 ]
机构
[1] Xi An Jiao Tong Univ, Sch Math & Stat, Xian 710049, Shaanxi, Peoples R China
[2] Xi An Jiao Tong Univ, Sch Energy & Power Engn, Xian 710049, Shaanxi, Peoples R China
[3] Northwestern Polytech Univ Shenzhen, Res & Dev Inst, Shenzhen 518063, Guangdong, Peoples R China
[4] Northwestern Polytech Univ, Sch Math & Stat, Xian 710072, Shaanxi, Peoples R China
[5] Northwest Univ, Sch Math, Xian 710127, Shaanxi, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Multi-modal image processing; Multi-level bilateral interactive attention; network; Bilateral interactive attention layer; Long and short local shortcuts; IMAGE FUSION; PHOTOGRAPHY; FLASH;
D O I
10.1016/j.eswa.2023.120733
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolutional neural networks (CNNs) have achieved impressive success in the multi-modal image processing (MIP) area. However, many existing CNN approaches fuse the features of the target and guidance images only once, which may cause a loss of information. To alleviate this problem, we present a multi-level bilateral interactive attention network (MBIAN) to fuse the features of the target and guidance images by their progressive interaction at different levels. Concretely, for each level, a bilateral interactive attention block (BIAB) is proposed to fuse the information of target and guidance images and refine their features. As the core component of our BIAB, a novel bilateral interactive attention layer (BIAL) is designed, where target and guidance images can mutually determine the attention weights. In addition, in each BIAB, long and short local shortcuts are employed to further facilitate the flow of information. Numerical experiments are conducted for three different problems, including panchromatic guided multi-spectral image super-resolution, near-infrared guided RGB image denoising, and flash-guided no-flash image denoising. The results demonstrate the versatility and superiority of MBIAN in terms of quantitative metrics and visual inspection, against 14 popular and state-of-the-art methods.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Multi-Modal fusion with multi-level attention for Visual Dialog
    Zhang, Jingping
    Wang, Qiang
    Han, Yahong
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (04)
  • [2] Multi-Level Multi-Modal Cross-Attention Network for Fake News Detection
    Ying, Long
    Yu, Hui
    Wang, Jinguang
    Ji, Yongze
    Qian, Shengsheng
    IEEE ACCESS, 2021, 9 : 132363 - 132373
  • [3] Multi-level Interaction Network for Multi-Modal Rumor Detection
    Zou, Ting
    Qian, Zhong
    Li, Peifeng
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [4] MIA-Net: Multi-Modal Interactive Attention Network for Multi-Modal Affective Analysis
    Li, Shuzhen
    Zhang, Tong
    Chen, Bianna
    Chen, C. L. Philip
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (04) : 2796 - 2809
  • [5] Multi-Level Cross-Modal Interactive-Network-Based Semi-Supervised Multi-Modal Ship Classification
    Song, Xin
    Chen, Zhikui
    Zhong, Fangming
    Gao, Jing
    Zhang, Jianning
    Li, Peng
    SENSORS, 2024, 24 (22)
  • [6] MLMFNet: A multi-level modality fusion network for multi-modal accelerated MRI reconstruction
    Zhou, Xiuyun
    Zhang, Zhenxi
    Du, Hongwei
    Qiu, Bensheng
    MAGNETIC RESONANCE IMAGING, 2024, 111 : 246 - 255
  • [7] Multi-level fusion network for mild cognitive impairment identification using multi-modal neuroimages
    Xu, Haozhe
    Zhong, Shengzhou
    Zhang, Yu
    PHYSICS IN MEDICINE AND BIOLOGY, 2023, 68 (09):
  • [8] Multi-Level Attention Interactive Network for Cloud and Snow Detection Segmentation
    Ding, Li
    Xia, Min
    Lin, Haifeng
    Hu, Kai
    REMOTE SENSING, 2024, 16 (01)
  • [9] Multi-level and Multi-modal Target Detection Based on Feature Fusion
    Cheng T.
    Sun L.
    Hou D.
    Shi Q.
    Zhang J.
    Chen J.
    Huang H.
    Qiche Gongcheng/Automotive Engineering, 2021, 43 (11): : 1602 - 1610
  • [10] Multi-level Deep Correlative Networks for Multi-modal Sentiment Analysis
    CAI Guoyong
    LYU Guangrui
    LIN Yuming
    WEN Yimin
    Chinese Journal of Electronics, 2020, 29 (06) : 1025 - 1038