Global Selection and Local Attention Network for Referring Image Segmentation

被引:0
|
作者
Ding, Haixin [1 ]
Zhang, Shengchuan [1 ]
Cao, Liujuan [1 ]
机构
[1] Xiamen Univ, Key Lab Multimedia Trusted Percept & Efficient Co, Minist Educ China, Xiamen 361005, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Referring image segmentation; vision-language; global-local; image segmentation;
D O I
10.1007/978-981-99-8540-1_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring image segmentation (RIS) aims to segment the target object based on a natural language expression. The challenge lies in comprehending both the image and the referring expression simultaneously, while establishing the alignment between these two modalities. Recently, the visual-language large-scale pre-trained model CLIP can well align the modalities. However, the alignment in these models is based on the global image. And RIS requires aligning global text features with local visual features, rather than global visual features. To this end, features extracted by CLIP can not be directly applied to RIS. In this paper, we propose a novel framework called Global Selection and Local Attention Network (GLNet), which builds upon CLIP. GLNet comprises two modules: Global Selection and Fusion Module (GSFM) and Local Attention Module (LAM). GSFM utilizes text information to adaptively select and fuse visual features from low-level and middle-level. LAM leverages attention mechanisms on both local visual features and local text features to establish relationships between objects and text. Extensive experiments demonstrate the exceptional performance of our proposed method in referring image segmentation. On RefCOCO+, GLNet achieves significant performance gains of +2.38%, +2.78%, and +2.50% on the three splits compared to SADLR.
引用
收藏
页码:284 / 295
页数:12
相关论文
共 50 条
  • [21] Attention Guided Global Enhancement and Local Refinement Network for Semantic Segmentation
    Li, Jiangyun
    Zha, Sen
    Chen, Chen
    Ding, Meng
    Zhang, Tianxiang
    Yu, Hong
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3211 - 3223
  • [22] Local aggregation and global attention network for hyperspectral image classification with spectral-induced aligned superpixel segmentation
    Chen, Zhonghao
    Wu, Guoyong
    Gao, Hongmin
    Ding, Yao
    Hong, Danfeng
    Zhang, Bing
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 232
  • [23] SDPN: A Slight Dual-Path Network With Local-Global Attention Guided for Medical Image Segmentation
    Wang, Jing
    Li, Shuyi
    Yu, Luyue
    Qu, Aixi
    Wang, Qing
    Liu, Ju
    Wu, Qiang
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (06) : 2956 - 2967
  • [24] A Dual Global-Local Attention Network for Hyperspectral Band Selection
    He, Ke
    Sun, Weiwei
    Yang, Gang
    Meng, Xiangchao
    Ren, Kai
    Peng, Jiangtao
    Du, Qian
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [25] Bidirectional Relationship Inferring Network for Referring Image Localization and Segmentation
    Feng, Guang
    Hu, Zhiwei
    Zhang, Lihe
    Sun, Jiayu
    Lu, Huchuan
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (05) : 2246 - 2258
  • [26] Multi-Attention Network for Compressed Video Referring Object Segmentation
    Chen, Weidong
    Hong, Dexiang
    Qi, Yuankai
    Han, Zhenjun
    Wang, Shuhui
    Qing, Laiyun
    Huang, Qingming
    Li, Guorong
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4416 - 4425
  • [27] Cross-modal attention guided visual reasoning for referring image segmentation
    Zhang, Wenjing
    Hu, Mengnan
    Tan, Quange
    Zhou, Qianli
    Wang, Rong
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 28853 - 28872
  • [28] Multi-Modal Mutual Attention and Iterative Interaction for Referring Image Segmentation
    Liu, Chang
    Ding, Henghui
    Zhang, Yulun
    Jiang, Xudong
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3054 - 3065
  • [29] Cross-modal attention guided visual reasoning for referring image segmentation
    Wenjing Zhang
    Mengnan Hu
    Quange Tan
    Qianli Zhou
    Rong Wang
    [J]. Multimedia Tools and Applications, 2023, 82 : 28853 - 28872
  • [30] Semi-global shape-aware attention network for image segmentation and retrieval
    Zhang, Pengju
    Zhu, Jiagang
    Zhang, Chaofan
    Rong, Zheng
    Wu, Yihong
    [J]. NEUROCOMPUTING, 2022, 506 : 369 - 379