Multi-Modal Mutual Attention and Iterative Interaction for Referring Image Segmentation

被引：9

作者：

Liu, Chang ^{[1
]}

Ding, Henghui ^{[2
]}

Zhang, Yulun ^{[2
]}

Jiang, Xudong ^{[1
]}

机构：

[1] Nanyang Technol Univ, Sch Elect & Elect Engn EEE, Singapore 639798, Singapore

[2] Swiss Fed Inst Technol, Comp Vis Lab CVL, CH-8092 Zurich, Switzerland

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2023年 / 32卷

关键词：

Transformers; Decoding; Image segmentation; Task analysis; Feature extraction; Image reconstruction; Iterative methods; Referring image segmentation; multi-modal mutual attention; iterative multi-modal interaction; language feature reconstruction;

D O I：

10.1109/TIP.2023.3277791

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We address the problem of referring image segmentation that aims to generate a mask for the object specified by a natural language expression. Many recent works utilize Transformer to extract features for the target object by aggregating the attended visual regions. However, the generic attention mechanism in Transformer only uses the language input for attention weight calculation, which does not explicitly fuse language features in its output. Thus, its output feature is dominated by vision information, which limits the model to comprehensively understand the multi-modal information, and brings uncertainty for the subsequent mask decoder to extract the output mask. To address this issue, we propose Multi-Modal Mutual Attention (M(3)Att) and Multi-Modal Mutual Decoder (M(3)Dec) that better fuse information from the two input modalities. Based on M(3)Dec, we further propose Iterative Multi-modal Interaction (IMI) to allow continuous and in-depth interactions between language and vision features. Furthermore, we introduce Language Feature Reconstruction (LFR) to prevent the language information from being lost or distorted in the extracted feature. Extensive experiments show that our proposed approach significantly improves the baseline and outperforms state-of-the-art referring image segmentation methods on RefCOCO series datasets consistently.

引用

页码：3054 / 3065

页数：12

共 50 条

[41] Automated segmentation for multi-modal magnetic resonance image of glioblastoma multiforme
Lai X.-B.
Zhang X.-Q.
Xu M.-S.
Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2019, 53 (02): : 355 - 363
[42] Deep Learning Based Multi-modal Cardiac MR Image Segmentation
Zheng, Rencheng
Zhao, Xingzhong
Zhao, Xingming
Wang, He
STATISTICAL ATLASES AND COMPUTATIONAL MODELS OF THE HEART: MULTI-SEQUENCE CMR SEGMENTATION, CRT-EPIGGY AND LV FULL QUANTIFICATION CHALLENGES, 2020, 12009 : 263 - 270
[43] What Image Features Are Useful for Tumor Segmentation in Multi-Modal Images?
Hu, Y.
Grossberg, M.
Mageras, G.
MEDICAL PHYSICS, 2015, 42 (06) : 3213 - 3213
[44] AMC: Attention guided Multi-modal Correlation Learning for Image Search
Chen, Kan
Bui, Trung
Fang, Chen
Wang, Zhaowen
Nevatia, Ram
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6203 - 6211
[45] Deep fusion of multi-modal features for brain tumor image segmentation
Zhang, Guying
Zhou, Jia
He, Guanghua
Zhu, Hancan
HELIYON, 2023, 9 (08)
[46] Automatic Multi-modal Image Segmentation for Applications in Cardiac Computational Physiology
Ecabert, O.
Peters, J.
Meyer, C.
Kneser, R.
Lehmann, H.
Groth, A.
Weese, J.
WORLD CONGRESS ON MEDICAL PHYSICS AND BIOMEDICAL ENGINEERING, VOL 25, PT 4: IMAGE PROCESSING, BIOSIGNAL PROCESSING, MODELLING AND SIMULATION, BIOMECHANICS, 2010, 25 : 1877 - 1877
[47] Multi-modal image registration based on empirical mode decomposition and mutual information
School of Electrical and Electronic Engineering, North China Electric Power University, Baoding 071003, China
不详
Yi Qi Yi Biao Xue Bao, 2009, 10 (2076-2081):
[48] A hybrid model combining tensor and mutual information for multi-modal image registration
Li, Pei
Jiang, Gang
Ma, Qianli
Xue, Wanfeng
Yang, Weihua
Cehui Xuebao/Acta Geodaetica et Cartographica Sinica, 2021, 50 (07): : 916 - 929
[49] Multi-modal Complete Breast Segmentation
Zolfagharnasab, Hooshiar
Monteiro, Joao P.
Teixeira, Joao F.
Borlinhas, Filipa
Oliveira, Helder P.
PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2017), 2017, 10255 : 519 - 527
[50] Multi-Modal Interaction Device
Kim, Yul Hee
Byeon, Sang-Kyu
Kim, Yu-Joon
Choi, Dong-Soo
Kim, Sang-Youn
INTERNATIONAL CONFERENCE ON MECHANICAL DESIGN, MANUFACTURE AND AUTOMATION ENGINEERING (MDMAE 2014), 2014, : 327 - 330

← 1 2 3 4 5 →