Global Selection and Local Attention Network for Referring Image Segmentation

被引：0

作者：

Ding, Haixin ^{[1
]}

Zhang, Shengchuan ^{[1
]}

Cao, Liujuan ^{[1
]}

机构：

[1] Xiamen Univ, Key Lab Multimedia Trusted Percept & Efficient Co, Minist Educ China, Xiamen 361005, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII | 2024年 / 14431卷

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Referring image segmentation; vision-language; global-local; image segmentation;

D O I：

10.1007/978-981-99-8540-1_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Referring image segmentation (RIS) aims to segment the target object based on a natural language expression. The challenge lies in comprehending both the image and the referring expression simultaneously, while establishing the alignment between these two modalities. Recently, the visual-language large-scale pre-trained model CLIP can well align the modalities. However, the alignment in these models is based on the global image. And RIS requires aligning global text features with local visual features, rather than global visual features. To this end, features extracted by CLIP can not be directly applied to RIS. In this paper, we propose a novel framework called Global Selection and Local Attention Network (GLNet), which builds upon CLIP. GLNet comprises two modules: Global Selection and Fusion Module (GSFM) and Local Attention Module (LAM). GSFM utilizes text information to adaptively select and fuse visual features from low-level and middle-level. LAM leverages attention mechanisms on both local visual features and local text features to establish relationships between objects and text. Extensive experiments demonstrate the exceptional performance of our proposed method in referring image segmentation. On RefCOCO+, GLNet achieves significant performance gains of +2.38%, +2.78%, and +2.50% on the three splits compared to SADLR.

引用

页码：284 / 295

页数：12

共 50 条

[1] Global and Local Interactive Perception Network for Referring Image Segmentation
Liu, Jing
Tan, Hongchen
Hu, Yongli
Sun, Yanfeng
Wang, Huasheng
Yin, Baocai
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, : 1 - 14
[2] Structured Attention Network for Referring Image Segmentation
Lin, Liang
Yan, Pengxiang
Xu, Xiaoqian
Yang, Sibei
Zeng, Kun
Li, Guanbin
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1922 - 1932
[3] Local-global coordination with transformers for referring image segmentation
Liu, Fang
Kong, Yuqiu
Zhang, Lihe
Feng, Guang
Yin, Baocai
[J]. NEUROCOMPUTING, 2023, 522 : 39 - 52
[4] Multiscale deep feature selection fusion network for referring image segmentation
Xianwen Dai
Jiacheng Lin
Ke Nai
Qingpeng Li
Zhiyong Li
[J]. Multimedia Tools and Applications, 2024, 83 : 36287 - 36305
[5] Multiscale deep feature selection fusion network for referring image segmentation
Dai, Xianwen
Lin, Jiacheng
Nai, Ke
Li, Qingpeng
Li, Zhiyong
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (12) : 36287 - 36305
[6] Cross-Modal Self-Attention Network for Referring Image Segmentation
Ye, Linwei
Rochan, Mrigank
Liu, Zhi
Wang, Yang
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10494 - 10503
[7] Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation
Feng, Guang
Hu, Zhiwei
Zhang, Lihe
Lu, Huchuan
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15501 - 15510
[8] Cascade Grouped Attention Network for Referring Expression Segmentation
Luo, Gen
Zhou, Yiyi
Ji, Rongrong
Sun, Xiaoshuai
Su, Jinsong
Lin, Chia-Wen
Tian, Qi
[J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1274 - 1282
[9] Zero-shot Referring Image Segmentation with Global-Local Context Features
Yu, Seonghoon
Seo, Paul Hongsuck
Son, Jeany
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19456 - 19465
[10] An anisotropic non-local attention network for image segmentation
Yuan, Feiniu
Zhu, Yaowen
Li, Kang
Fang, Zhijun
Shi, Jinting
[J]. MACHINE VISION AND APPLICATIONS, 2022, 33 (02)

← 1 2 3 4 5 →