Cascade Grouped Attention Network for Referring Expression Segmentation

被引：69

作者：

Luo, Gen ^{[1
]}

Zhou, Yiyi ^{[1
]}

Ji, Rongrong ^{[1
]}

Sun, Xiaoshuai ^{[1
]}

Su, Jinsong ^{[1
]}

Lin, Chia-Wen ^{[2
]}

Tian, Qi ^{[3
]}

机构：

[1] Xiamen Univ, Media Analyt & Comp Lab, Dept Artificia Intelligence, Sch Informat, Xiamen 361005, Peoples R China

[2] Natl Tsing Hua Univ, Hsinchu, Taiwan

[3] Huawei Technol, Huawei Cloud BU, Shenzhen, Guangdong, Peoples R China

来源：

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年

基金：

中国国家自然科学基金;

关键词：

Referring Expression Segmentation; Attention Network;

D O I：

10.1145/3394171.3414006

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Referring expression segmentation (RES) aims to segment the target instance in a given image according to a natural language expression. Its main challenge lies in how to quickly and accurately align the text expression to the referred visual instances. In this paper, we focus on addressing this issue by proposing a Cascade Grouped Attention Network (CGAN) with two innovative designs: Cascade Grouped Attention (CGA) and Instance-level Attention (ILA) loss. Specifically, CGA is used to perform step-wise reasoning over the entire image to perceive the differences between instances accurately yet efficiently, so as to identify the referent. ILA loss is further embedded into each step of CGA to directly supervise the attention modeling, which improves the alignments between the text expression and the visual instances. Through these two novel designs, CGAN can achieve the high efficiency of one-stage RES while possessing a strong reasoning ability comparable to the two-stage methods. To validate our model, we conduct extensive experiments on three RES benchmark datasets and achieve significant performance gains over existing one-stage and multi-stage models.

引用

页码：1274 / 1282

页数：9

共 50 条

[41] Attribute-Guided Attention for Referring Expression Generation and Comprehension
Liu, Jingyu
Wang, Wei
Wang, Liang
Yang, Ming-Hsuan
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 5244 - 5258
[42] CSAUNet: A cascade self-attention u-shaped network for precise fundus vessel segmentation
Huang, Zheng
Sun, Ming
Liu, Yuxin
Wu, Jiajun
[J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 75
[43] Triple attention network for video segmentation
Tian, Yan
Zhang, Yujie
Zhou, Di
Cheng, Guohua
Chen, Wei-Gang
Wang, Ruili
[J]. NEUROCOMPUTING, 2020, 417 (417) : 202 - 211
[44] Evaluation of grouped capsule network for intracranial hemorrhage segmentation in CT scans
Wang, Lingying
Tang, Menglin
Hu, Xiuying
[J]. SCIENTIFIC REPORTS, 2023, 13 (01)
[45] Bilateral attention network for semantic segmentation
Wang, Dongli
Li, Nanjun
Zhou, Yan
Mu, Jinzhen
[J]. IET IMAGE PROCESSING, 2021, 15 (08) : 1607 - 1616
[46] CROSS ATTENTION NETWORK FOR SEMANTIC SEGMENTATION
Liu, Mengyu
Yin, Hujun
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 2434 - 2438
[47] Embedded Attention Network for Semantic Segmentation
Lv, Qingxuan
Feng, Mingzhe
Sun, Xin
Dong, Junyu
Chen, Changrui
Zhang, Yu
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (01): : 326 - 333
[48] Dual Attention Network for Scene Segmentation
Fu, Jun
Liu, Jing
Tian, Haijie
Li, Yong
Bao, Yongjun
Fang, Zhiwei
Lu, Hanqing
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3141 - 3149
[49] Dynamic attention network for semantic segmentation
Wu, Fei
Chen, Feng
Jing, Xiao-Yuan
Hu, Chang-Hui
Ge, Qi
Ji, Yimu
[J]. NEUROCOMPUTING, 2020, 384 (384) : 182 - 191
[50] Shallow Attention Network for Polyp Segmentation
Wei, Jun
Hu, Yiwen
Zhang, Ruimao
Li, Zhen
Zhou, S. Kevin
Cui, Shuguang
[J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT I, 2021, 12901 : 699 - 708

← 1 2 3 4 5 →