Cascade Grouped Attention Network for Referring Expression Segmentation

被引:69
|
作者
Luo, Gen [1 ]
Zhou, Yiyi [1 ]
Ji, Rongrong [1 ]
Sun, Xiaoshuai [1 ]
Su, Jinsong [1 ]
Lin, Chia-Wen [2 ]
Tian, Qi [3 ]
机构
[1] Xiamen Univ, Media Analyt & Comp Lab, Dept Artificia Intelligence, Sch Informat, Xiamen 361005, Peoples R China
[2] Natl Tsing Hua Univ, Hsinchu, Taiwan
[3] Huawei Technol, Huawei Cloud BU, Shenzhen, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Referring Expression Segmentation; Attention Network;
D O I
10.1145/3394171.3414006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring expression segmentation (RES) aims to segment the target instance in a given image according to a natural language expression. Its main challenge lies in how to quickly and accurately align the text expression to the referred visual instances. In this paper, we focus on addressing this issue by proposing a Cascade Grouped Attention Network (CGAN) with two innovative designs: Cascade Grouped Attention (CGA) and Instance-level Attention (ILA) loss. Specifically, CGA is used to perform step-wise reasoning over the entire image to perceive the differences between instances accurately yet efficiently, so as to identify the referent. ILA loss is further embedded into each step of CGA to directly supervise the attention modeling, which improves the alignments between the text expression and the visual instances. Through these two novel designs, CGAN can achieve the high efficiency of one-stage RES while possessing a strong reasoning ability comparable to the two-stage methods. To validate our model, we conduct extensive experiments on three RES benchmark datasets and achieve significant performance gains over existing one-stage and multi-stage models.
引用
收藏
页码:1274 / 1282
页数:9
相关论文
共 50 条
  • [1] Grouped Double Attention Network for Semantic Segmentation
    Chen Xiaolong
    Zhao Ji
    Chen Siyi
    Du Xinhao
    Liu Xin
    [J]. LASER & OPTOELECTRONICS PROGRESS, 2021, 58 (22)
  • [2] Structured Attention Network for Referring Image Segmentation
    Lin, Liang
    Yan, Pengxiang
    Xu, Xiaoqian
    Yang, Sibei
    Zeng, Kun
    Li, Guanbin
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1922 - 1932
  • [3] Global Selection and Local Attention Network for Referring Image Segmentation
    Ding, Haixin
    Zhang, Shengchuan
    Cao, Liujuan
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII, 2024, 14431 : 284 - 295
  • [4] Query Reconstruction Network for Referring Expression Image Segmentation
    Shi, Hengcan
    Li, Hongliang
    Wu, Qingbo
    Ngan, King Ngi
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 995 - 1007
  • [5] MAttNet: Modular Attention Network for Referring Expression Comprehension
    Yu, Licheng
    Lin, Zhe
    Shen, Xiaohui
    Yang, Jimei
    Lu, Xin
    Bansal, Mohit
    Berg, Tamara L.
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1307 - 1315
  • [6] Multi-Attention Network for Compressed Video Referring Object Segmentation
    Chen, Weidong
    Hong, Dexiang
    Qi, Yuankai
    Han, Zhenjun
    Wang, Shuhui
    Qing, Laiyun
    Huang, Qingming
    Li, Guorong
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4416 - 4425
  • [7] Cross-Modal Self-Attention Network for Referring Image Segmentation
    Ye, Linwei
    Rochan, Mrigank
    Liu, Zhi
    Wang, Yang
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10494 - 10503
  • [8] Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation
    Feng, Guang
    Hu, Zhiwei
    Zhang, Lihe
    Lu, Huchuan
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15501 - 15510
  • [9] Multiple Relational Learning Network for Joint Referring Expression Comprehension and Segmentation
    Hua, Guoguang
    Liao, Muxin
    Tian, Shishun
    Zhang, Yuhang
    Zou, Wenbin
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8805 - 8816
  • [10] Cross-modality synergy network for referring expression comprehension and segmentation
    Li, Qianzhong
    Zhang, Yujia
    Sun, Shiying
    Wu, Jinting
    Zhao, Xiaoguang
    Tan, Min
    [J]. NEUROCOMPUTING, 2022, 467 : 99 - 114