Global Selection and Local Attention Network for Referring Image Segmentation

被引：0

作者：

Ding, Haixin ^{[1
]}

Zhang, Shengchuan ^{[1
]}

Cao, Liujuan ^{[1
]}

机构：

[1] Xiamen Univ, Key Lab Multimedia Trusted Percept & Efficient Co, Minist Educ China, Xiamen 361005, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII | 2024年 / 14431卷

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Referring image segmentation; vision-language; global-local; image segmentation;

D O I：

10.1007/978-981-99-8540-1_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Referring image segmentation (RIS) aims to segment the target object based on a natural language expression. The challenge lies in comprehending both the image and the referring expression simultaneously, while establishing the alignment between these two modalities. Recently, the visual-language large-scale pre-trained model CLIP can well align the modalities. However, the alignment in these models is based on the global image. And RIS requires aligning global text features with local visual features, rather than global visual features. To this end, features extracted by CLIP can not be directly applied to RIS. In this paper, we propose a novel framework called Global Selection and Local Attention Network (GLNet), which builds upon CLIP. GLNet comprises two modules: Global Selection and Fusion Module (GSFM) and Local Attention Module (LAM). GSFM utilizes text information to adaptively select and fuse visual features from low-level and middle-level. LAM leverages attention mechanisms on both local visual features and local text features to establish relationships between objects and text. Extensive experiments demonstrate the exceptional performance of our proposed method in referring image segmentation. On RefCOCO+, GLNet achieves significant performance gains of +2.38%, +2.78%, and +2.50% on the three splits compared to SADLR.

引用

页码：284 / 295

页数：12

共 50 条

[21] Attention Guided Global Enhancement and Local Refinement Network for Semantic Segmentation
Li, Jiangyun
Zha, Sen
Chen, Chen
Ding, Meng
Zhang, Tianxiang
Yu, Hong
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3211 - 3223
[22] Local aggregation and global attention network for hyperspectral image classification with spectral-induced aligned superpixel segmentation
Chen, Zhonghao
Wu, Guoyong
Gao, Hongmin
Ding, Yao
Hong, Danfeng
Zhang, Bing
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 232
[23] SDPN: A Slight Dual-Path Network With Local-Global Attention Guided for Medical Image Segmentation
Wang, Jing
Li, Shuyi
Yu, Luyue
Qu, Aixi
Wang, Qing
Liu, Ju
Wu, Qiang
[J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (06) : 2956 - 2967
[24] A Dual Global-Local Attention Network for Hyperspectral Band Selection
He, Ke
Sun, Weiwei
Yang, Gang
Meng, Xiangchao
Ren, Kai
Peng, Jiangtao
Du, Qian
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[25] Bidirectional Relationship Inferring Network for Referring Image Localization and Segmentation
Feng, Guang
Hu, Zhiwei
Zhang, Lihe
Sun, Jiayu
Lu, Huchuan
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (05) : 2246 - 2258
[26] Multi-Attention Network for Compressed Video Referring Object Segmentation
Chen, Weidong
Hong, Dexiang
Qi, Yuankai
Han, Zhenjun
Wang, Shuhui
Qing, Laiyun
Huang, Qingming
Li, Guorong
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4416 - 4425
[27] Cross-modal attention guided visual reasoning for referring image segmentation
Zhang, Wenjing
Hu, Mengnan
Tan, Quange
Zhou, Qianli
Wang, Rong
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 28853 - 28872
[28] Multi-Modal Mutual Attention and Iterative Interaction for Referring Image Segmentation
Liu, Chang
Ding, Henghui
Zhang, Yulun
Jiang, Xudong
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3054 - 3065
[29] Cross-modal attention guided visual reasoning for referring image segmentation
Wenjing Zhang
Mengnan Hu
Quange Tan
Qianli Zhou
Rong Wang
[J]. Multimedia Tools and Applications, 2023, 82 : 28853 - 28872
[30] Semi-global shape-aware attention network for image segmentation and retrieval
Zhang, Pengju
Zhu, Jiagang
Zhang, Chaofan
Rong, Zheng
Wu, Yihong
[J]. NEUROCOMPUTING, 2022, 506 : 369 - 379

← 1 2 3 4 5 →