Cascade Grouped Attention Network for Referring Expression Segmentation

被引:69
|
作者
Luo, Gen [1 ]
Zhou, Yiyi [1 ]
Ji, Rongrong [1 ]
Sun, Xiaoshuai [1 ]
Su, Jinsong [1 ]
Lin, Chia-Wen [2 ]
Tian, Qi [3 ]
机构
[1] Xiamen Univ, Media Analyt & Comp Lab, Dept Artificia Intelligence, Sch Informat, Xiamen 361005, Peoples R China
[2] Natl Tsing Hua Univ, Hsinchu, Taiwan
[3] Huawei Technol, Huawei Cloud BU, Shenzhen, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Referring Expression Segmentation; Attention Network;
D O I
10.1145/3394171.3414006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring expression segmentation (RES) aims to segment the target instance in a given image according to a natural language expression. Its main challenge lies in how to quickly and accurately align the text expression to the referred visual instances. In this paper, we focus on addressing this issue by proposing a Cascade Grouped Attention Network (CGAN) with two innovative designs: Cascade Grouped Attention (CGA) and Instance-level Attention (ILA) loss. Specifically, CGA is used to perform step-wise reasoning over the entire image to perceive the differences between instances accurately yet efficiently, so as to identify the referent. ILA loss is further embedded into each step of CGA to directly supervise the attention modeling, which improves the alignments between the text expression and the visual instances. Through these two novel designs, CGAN can achieve the high efficiency of one-stage RES while possessing a strong reasoning ability comparable to the two-stage methods. To validate our model, we conduct extensive experiments on three RES benchmark datasets and achieve significant performance gains over existing one-stage and multi-stage models.
引用
收藏
页码:1274 / 1282
页数:9
相关论文
共 50 条
  • [21] Language-Attention Modular-Network for Relational Referring Expression Comprehension in Videos
    Dhingra, Naina
    Jain, Shipra
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4103 - 4110
  • [22] Dynamic Graph Attention for Referring Expression Comprehension
    Yang, Sibei
    Li, Guanbin
    Yu, Yizhou
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4643 - 4652
  • [23] Referring Image Segmentation via Language-Driven Attention
    Chen, Ding-Jie
    Hsieh, He-Yen
    Liu, Tyng-Luh
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 13997 - 14003
  • [24] Dual Convolutional LSTM Network for Referring Image Segmentation
    Ye, Linwei
    Liu, Zhi
    Wang, Yang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) : 3224 - 3235
  • [25] Structured Multimodal Fusion Network for Referring Image Segmentation
    Xue, Mingcheng
    Liu, Yu
    Xu, Kaiping
    Zhang, Haiyang
    Yu, Chengyang
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 36 - 47
  • [26] A CONTEXT-BASED NETWORK FOR REFERRING IMAGE SEGMENTATION
    Li, Xinyu
    Liu, Yu
    Xu, Kaiping
    Zhao, Zhehuan
    Liu, Sipei
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1436 - 1440
  • [27] Bilateral Knowledge Interaction Network for Referring Image Segmentation
    Ding, Haixin
    Zhang, Shengchuan
    Wu, Qiong
    Yu, Songlin
    Hu, Jie
    Cao, Liujuan
    Ji, Rongrong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2966 - 2977
  • [28] Advancing Referring Expression Segmentation Beyond Single Image
    Wu, Yixuan
    Zhang, Zhao
    Xie, Chi
    Zhu, Feng
    Zhao, Rui
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2628 - 2638
  • [29] Multi-level attention for referring expression comprehension
    Sun, Yanfeng
    Zhang, Yunru
    Jiang, Huajie
    Hu, Yongli
    Yin, Baocai
    [J]. PATTERN RECOGNITION LETTERS, 2023, 172 : 252 - 258
  • [30] Unambiguous Scene Text Segmentation With Referring Expression Comprehension
    Rong, Xuejian
    Yi, Chucai
    Tian, Yingli
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 591 - 601