Structured Attention Network for Referring Image Segmentation

被引:20
|
作者
Lin, Liang [1 ]
Yan, Pengxiang [1 ]
Xu, Xiaoqian [1 ]
Yang, Sibei [2 ]
Zeng, Kun [1 ]
Li, Guanbin [1 ]
机构
[1] Sun Yat Sen Univ, Sch Engn & Comp Sci, Guangzhou 510006, Peoples R China
[2] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Linguistics; Image segmentation; Cognition; Feature extraction; Semantics; Task analysis; Referring image segmentation; vision and language; cross-modal reasoning;
D O I
10.1109/TMM.2021.3074008
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Referring image segmentation aims at segmenting out the object or stuff referred to by a natural language expression. The challenge of this task lies in the requirement of understanding both vision and language. The linguistic structure of a referring expression can provide an intuitive and explainable layout for reasoning over visual and linguistic concepts. In this paper, we propose a structured attention network (SANet) to explore the multimodal reasoning over the dependency tree parsed from the referring expression. Specifically, SANet implements the multimodal reasoning using an attentional multimodal tree-structure recurrent module (AMTreeGRU) in a bottom-up manner. In addition, for spatial detail improvement, SANet further incorporates the semantics-guided low-level features into high-level ones using the proposed attentional skip connection module. Extensive experiments on four public benchmark datasets demonstrate the superiority of our proposed SANet with more explainable visualization examples.
引用
收藏
页码:1922 / 1932
页数:11
相关论文
共 50 条
  • [21] Multiscale deep feature selection fusion network for referring image segmentation
    Dai, Xianwen
    Lin, Jiacheng
    Nai, Ke
    Li, Qingpeng
    Li, Zhiyong
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (12) : 36287 - 36305
  • [22] Evolutionary Attention Network for Medical Image Segmentation
    Hassanzadeh, Tahereh
    Essam, Daryl
    Sarker, Ruhul
    [J]. 2020 DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), 2020,
  • [23] Scale channel attention network for image segmentation
    Chen, Jianjun
    Tian, Youliang
    Ma, Wei
    Mao, Zhengdong
    Hu, Yue
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (11) : 16473 - 16489
  • [24] Attention Guided Network for Retinal Image Segmentation
    Zhang, Shihao
    Fu, Huazhu
    Yan, Yuguang
    Zhang, Yubing
    Wu, Qingyao
    Yang, Ming
    Tan, Mingkui
    Xu, Yanwu
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT I, 2019, 11764 : 797 - 805
  • [25] Key-Word-Aware Network for Referring Expression Image Segmentation
    Shi, Hengcan
    Li, Hongliang
    Meng, Fanman
    Wu, Qingbo
    [J]. COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 : 38 - 54
  • [26] Scale channel attention network for image segmentation
    Jianjun Chen
    Youliang Tian
    Wei Ma
    Zhengdong Mao
    Yue Hu
    [J]. Multimedia Tools and Applications, 2021, 80 : 16473 - 16489
  • [27] Referring image segmentation with attention guided cross modal fusion for semantic oriented languages
    Qianli Zhou
    Rong Wang
    Haimiao Hu
    Quange Tan
    Wenjin Zhang
    [J]. Frontiers of Computer Science, 2022, 16
  • [28] Referring image segmentation with attention guided cross modal fusion for semantic oriented languages
    ZHOU Qianli
    WANG Rong
    HU Haimiao
    TAN Quange
    ZHANG Wenjin
    [J]. Frontiers of Computer Science, 2022, 16 (06)
  • [29] Referring image segmentation with attention guided cross modal fusion for semantic oriented languages
    Zhou, Qianli
    Wang, Rong
    Hu, Haimiao
    Tan, Quange
    Zhang, Wenjin
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2022, 16 (06)
  • [30] Hierarchical collaboration for referring image segmentation
    Zhang, Wei
    Cheng, Zesen
    Chen, Jie
    Gao, Wen
    [J]. Neurocomputing, 2025, 613