Cascade Grouped Attention Network for Referring Expression Segmentation

被引:69
|
作者
Luo, Gen [1 ]
Zhou, Yiyi [1 ]
Ji, Rongrong [1 ]
Sun, Xiaoshuai [1 ]
Su, Jinsong [1 ]
Lin, Chia-Wen [2 ]
Tian, Qi [3 ]
机构
[1] Xiamen Univ, Media Analyt & Comp Lab, Dept Artificia Intelligence, Sch Informat, Xiamen 361005, Peoples R China
[2] Natl Tsing Hua Univ, Hsinchu, Taiwan
[3] Huawei Technol, Huawei Cloud BU, Shenzhen, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Referring Expression Segmentation; Attention Network;
D O I
10.1145/3394171.3414006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring expression segmentation (RES) aims to segment the target instance in a given image according to a natural language expression. Its main challenge lies in how to quickly and accurately align the text expression to the referred visual instances. In this paper, we focus on addressing this issue by proposing a Cascade Grouped Attention Network (CGAN) with two innovative designs: Cascade Grouped Attention (CGA) and Instance-level Attention (ILA) loss. Specifically, CGA is used to perform step-wise reasoning over the entire image to perceive the differences between instances accurately yet efficiently, so as to identify the referent. ILA loss is further embedded into each step of CGA to directly supervise the attention modeling, which improves the alignments between the text expression and the visual instances. Through these two novel designs, CGAN can achieve the high efficiency of one-stage RES while possessing a strong reasoning ability comparable to the two-stage methods. To validate our model, we conduct extensive experiments on three RES benchmark datasets and achieve significant performance gains over existing one-stage and multi-stage models.
引用
收藏
页码:1274 / 1282
页数:9
相关论文
共 50 条
  • [41] Attribute-Guided Attention for Referring Expression Generation and Comprehension
    Liu, Jingyu
    Wang, Wei
    Wang, Liang
    Yang, Ming-Hsuan
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 5244 - 5258
  • [42] CSAUNet: A cascade self-attention u-shaped network for precise fundus vessel segmentation
    Huang, Zheng
    Sun, Ming
    Liu, Yuxin
    Wu, Jiajun
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 75
  • [43] Triple attention network for video segmentation
    Tian, Yan
    Zhang, Yujie
    Zhou, Di
    Cheng, Guohua
    Chen, Wei-Gang
    Wang, Ruili
    [J]. NEUROCOMPUTING, 2020, 417 (417) : 202 - 211
  • [44] Evaluation of grouped capsule network for intracranial hemorrhage segmentation in CT scans
    Wang, Lingying
    Tang, Menglin
    Hu, Xiuying
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [45] Bilateral attention network for semantic segmentation
    Wang, Dongli
    Li, Nanjun
    Zhou, Yan
    Mu, Jinzhen
    [J]. IET IMAGE PROCESSING, 2021, 15 (08) : 1607 - 1616
  • [46] CROSS ATTENTION NETWORK FOR SEMANTIC SEGMENTATION
    Liu, Mengyu
    Yin, Hujun
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 2434 - 2438
  • [47] Embedded Attention Network for Semantic Segmentation
    Lv, Qingxuan
    Feng, Mingzhe
    Sun, Xin
    Dong, Junyu
    Chen, Changrui
    Zhang, Yu
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (01): : 326 - 333
  • [48] Dual Attention Network for Scene Segmentation
    Fu, Jun
    Liu, Jing
    Tian, Haijie
    Li, Yong
    Bao, Yongjun
    Fang, Zhiwei
    Lu, Hanqing
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3141 - 3149
  • [49] Dynamic attention network for semantic segmentation
    Wu, Fei
    Chen, Feng
    Jing, Xiao-Yuan
    Hu, Chang-Hui
    Ge, Qi
    Ji, Yimu
    [J]. NEUROCOMPUTING, 2020, 384 (384) : 182 - 191
  • [50] Shallow Attention Network for Polyp Segmentation
    Wei, Jun
    Hu, Yiwen
    Zhang, Ruimao
    Li, Zhen
    Zhou, S. Kevin
    Cui, Shuguang
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT I, 2021, 12901 : 699 - 708