Cascade Grouped Attention Network for Referring Expression Segmentation

被引：69

作者：

Luo, Gen ^{[1
]}

Zhou, Yiyi ^{[1
]}

Ji, Rongrong ^{[1
]}

Sun, Xiaoshuai ^{[1
]}

Su, Jinsong ^{[1
]}

Lin, Chia-Wen ^{[2
]}

Tian, Qi ^{[3
]}

机构：

[1] Xiamen Univ, Media Analyt & Comp Lab, Dept Artificia Intelligence, Sch Informat, Xiamen 361005, Peoples R China

[2] Natl Tsing Hua Univ, Hsinchu, Taiwan

[3] Huawei Technol, Huawei Cloud BU, Shenzhen, Guangdong, Peoples R China

来源：

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年

基金：

中国国家自然科学基金;

关键词：

Referring Expression Segmentation; Attention Network;

D O I：

10.1145/3394171.3414006

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Referring expression segmentation (RES) aims to segment the target instance in a given image according to a natural language expression. Its main challenge lies in how to quickly and accurately align the text expression to the referred visual instances. In this paper, we focus on addressing this issue by proposing a Cascade Grouped Attention Network (CGAN) with two innovative designs: Cascade Grouped Attention (CGA) and Instance-level Attention (ILA) loss. Specifically, CGA is used to perform step-wise reasoning over the entire image to perceive the differences between instances accurately yet efficiently, so as to identify the referent. ILA loss is further embedded into each step of CGA to directly supervise the attention modeling, which improves the alignments between the text expression and the visual instances. Through these two novel designs, CGAN can achieve the high efficiency of one-stage RES while possessing a strong reasoning ability comparable to the two-stage methods. To validate our model, we conduct extensive experiments on three RES benchmark datasets and achieve significant performance gains over existing one-stage and multi-stage models.

引用

页码：1274 / 1282

页数：9

共 50 条

[1] Grouped Double Attention Network for Semantic Segmentation
Chen Xiaolong
Zhao Ji
Chen Siyi
Du Xinhao
Liu Xin
[J]. LASER & OPTOELECTRONICS PROGRESS, 2021, 58 (22)
[2] Structured Attention Network for Referring Image Segmentation
Lin, Liang
Yan, Pengxiang
Xu, Xiaoqian
Yang, Sibei
Zeng, Kun
Li, Guanbin
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1922 - 1932
[3] Global Selection and Local Attention Network for Referring Image Segmentation
Ding, Haixin
Zhang, Shengchuan
Cao, Liujuan
[J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII, 2024, 14431 : 284 - 295
[4] Query Reconstruction Network for Referring Expression Image Segmentation
Shi, Hengcan
Li, Hongliang
Wu, Qingbo
Ngan, King Ngi
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 995 - 1007
[5] MAttNet: Modular Attention Network for Referring Expression Comprehension
Yu, Licheng
Lin, Zhe
Shen, Xiaohui
Yang, Jimei
Lu, Xin
Bansal, Mohit
Berg, Tamara L.
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1307 - 1315
[6] Multi-Attention Network for Compressed Video Referring Object Segmentation
Chen, Weidong
Hong, Dexiang
Qi, Yuankai
Han, Zhenjun
Wang, Shuhui
Qing, Laiyun
Huang, Qingming
Li, Guorong
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4416 - 4425
[7] Cross-Modal Self-Attention Network for Referring Image Segmentation
Ye, Linwei
Rochan, Mrigank
Liu, Zhi
Wang, Yang
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10494 - 10503
[8] Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation
Feng, Guang
Hu, Zhiwei
Zhang, Lihe
Lu, Huchuan
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15501 - 15510
[9] Multiple Relational Learning Network for Joint Referring Expression Comprehension and Segmentation
Hua, Guoguang
Liao, Muxin
Tian, Shishun
Zhang, Yuhang
Zou, Wenbin
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8805 - 8816
[10] Cross-modality synergy network for referring expression comprehension and segmentation
Li, Qianzhong
Zhang, Yujia
Sun, Shiying
Wu, Jinting
Zhao, Xiaoguang
Tan, Min
[J]. NEUROCOMPUTING, 2022, 467 : 99 - 114

← 1 2 3 4 5 →