Multi-Modal Mutual Attention and Iterative Interaction for Referring Image Segmentation

被引:9
|
作者
Liu, Chang [1 ]
Ding, Henghui [2 ]
Zhang, Yulun [2 ]
Jiang, Xudong [1 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn EEE, Singapore 639798, Singapore
[2] Swiss Fed Inst Technol, Comp Vis Lab CVL, CH-8092 Zurich, Switzerland
关键词
Transformers; Decoding; Image segmentation; Task analysis; Feature extraction; Image reconstruction; Iterative methods; Referring image segmentation; multi-modal mutual attention; iterative multi-modal interaction; language feature reconstruction;
D O I
10.1109/TIP.2023.3277791
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address the problem of referring image segmentation that aims to generate a mask for the object specified by a natural language expression. Many recent works utilize Transformer to extract features for the target object by aggregating the attended visual regions. However, the generic attention mechanism in Transformer only uses the language input for attention weight calculation, which does not explicitly fuse language features in its output. Thus, its output feature is dominated by vision information, which limits the model to comprehensively understand the multi-modal information, and brings uncertainty for the subsequent mask decoder to extract the output mask. To address this issue, we propose Multi-Modal Mutual Attention (M(3)Att) and Multi-Modal Mutual Decoder (M(3)Dec) that better fuse information from the two input modalities. Based on M(3)Dec, we further propose Iterative Multi-modal Interaction (IMI) to allow continuous and in-depth interactions between language and vision features. Furthermore, we introduce Language Feature Reconstruction (LFR) to prevent the language information from being lost or distorted in the extracted feature. Extensive experiments show that our proposed approach significantly improves the baseline and outperforms state-of-the-art referring image segmentation methods on RefCOCO series datasets consistently.
引用
收藏
页码:3054 / 3065
页数:12
相关论文
共 50 条
  • [41] Automated segmentation for multi-modal magnetic resonance image of glioblastoma multiforme
    Lai X.-B.
    Zhang X.-Q.
    Xu M.-S.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2019, 53 (02): : 355 - 363
  • [42] Deep Learning Based Multi-modal Cardiac MR Image Segmentation
    Zheng, Rencheng
    Zhao, Xingzhong
    Zhao, Xingming
    Wang, He
    STATISTICAL ATLASES AND COMPUTATIONAL MODELS OF THE HEART: MULTI-SEQUENCE CMR SEGMENTATION, CRT-EPIGGY AND LV FULL QUANTIFICATION CHALLENGES, 2020, 12009 : 263 - 270
  • [43] What Image Features Are Useful for Tumor Segmentation in Multi-Modal Images?
    Hu, Y.
    Grossberg, M.
    Mageras, G.
    MEDICAL PHYSICS, 2015, 42 (06) : 3213 - 3213
  • [44] AMC: Attention guided Multi-modal Correlation Learning for Image Search
    Chen, Kan
    Bui, Trung
    Fang, Chen
    Wang, Zhaowen
    Nevatia, Ram
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6203 - 6211
  • [45] Deep fusion of multi-modal features for brain tumor image segmentation
    Zhang, Guying
    Zhou, Jia
    He, Guanghua
    Zhu, Hancan
    HELIYON, 2023, 9 (08)
  • [46] Automatic Multi-modal Image Segmentation for Applications in Cardiac Computational Physiology
    Ecabert, O.
    Peters, J.
    Meyer, C.
    Kneser, R.
    Lehmann, H.
    Groth, A.
    Weese, J.
    WORLD CONGRESS ON MEDICAL PHYSICS AND BIOMEDICAL ENGINEERING, VOL 25, PT 4: IMAGE PROCESSING, BIOSIGNAL PROCESSING, MODELLING AND SIMULATION, BIOMECHANICS, 2010, 25 : 1877 - 1877
  • [47] Multi-modal image registration based on empirical mode decomposition and mutual information
    School of Electrical and Electronic Engineering, North China Electric Power University, Baoding 071003, China
    不详
    Yi Qi Yi Biao Xue Bao, 2009, 10 (2076-2081):
  • [48] A hybrid model combining tensor and mutual information for multi-modal image registration
    Li, Pei
    Jiang, Gang
    Ma, Qianli
    Xue, Wanfeng
    Yang, Weihua
    Cehui Xuebao/Acta Geodaetica et Cartographica Sinica, 2021, 50 (07): : 916 - 929
  • [49] Multi-modal Complete Breast Segmentation
    Zolfagharnasab, Hooshiar
    Monteiro, Joao P.
    Teixeira, Joao F.
    Borlinhas, Filipa
    Oliveira, Helder P.
    PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2017), 2017, 10255 : 519 - 527
  • [50] Multi-Modal Interaction Device
    Kim, Yul Hee
    Byeon, Sang-Kyu
    Kim, Yu-Joon
    Choi, Dong-Soo
    Kim, Sang-Youn
    INTERNATIONAL CONFERENCE ON MECHANICAL DESIGN, MANUFACTURE AND AUTOMATION ENGINEERING (MDMAE 2014), 2014, : 327 - 330