Global Selection and Local Attention Network for Referring Image Segmentation

被引:0
|
作者
Ding, Haixin [1 ]
Zhang, Shengchuan [1 ]
Cao, Liujuan [1 ]
机构
[1] Xiamen Univ, Key Lab Multimedia Trusted Percept & Efficient Co, Minist Educ China, Xiamen 361005, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Referring image segmentation; vision-language; global-local; image segmentation;
D O I
10.1007/978-981-99-8540-1_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring image segmentation (RIS) aims to segment the target object based on a natural language expression. The challenge lies in comprehending both the image and the referring expression simultaneously, while establishing the alignment between these two modalities. Recently, the visual-language large-scale pre-trained model CLIP can well align the modalities. However, the alignment in these models is based on the global image. And RIS requires aligning global text features with local visual features, rather than global visual features. To this end, features extracted by CLIP can not be directly applied to RIS. In this paper, we propose a novel framework called Global Selection and Local Attention Network (GLNet), which builds upon CLIP. GLNet comprises two modules: Global Selection and Fusion Module (GSFM) and Local Attention Module (LAM). GSFM utilizes text information to adaptively select and fuse visual features from low-level and middle-level. LAM leverages attention mechanisms on both local visual features and local text features to establish relationships between objects and text. Extensive experiments demonstrate the exceptional performance of our proposed method in referring image segmentation. On RefCOCO+, GLNet achieves significant performance gains of +2.38%, +2.78%, and +2.50% on the three splits compared to SADLR.
引用
收藏
页码:284 / 295
页数:12
相关论文
共 50 条
  • [1] Global and Local Interactive Perception Network for Referring Image Segmentation
    Liu, Jing
    Tan, Hongchen
    Hu, Yongli
    Sun, Yanfeng
    Wang, Huasheng
    Yin, Baocai
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, : 1 - 14
  • [2] Structured Attention Network for Referring Image Segmentation
    Lin, Liang
    Yan, Pengxiang
    Xu, Xiaoqian
    Yang, Sibei
    Zeng, Kun
    Li, Guanbin
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1922 - 1932
  • [3] Local-global coordination with transformers for referring image segmentation
    Liu, Fang
    Kong, Yuqiu
    Zhang, Lihe
    Feng, Guang
    Yin, Baocai
    [J]. NEUROCOMPUTING, 2023, 522 : 39 - 52
  • [4] Multiscale deep feature selection fusion network for referring image segmentation
    Xianwen Dai
    Jiacheng Lin
    Ke Nai
    Qingpeng Li
    Zhiyong Li
    [J]. Multimedia Tools and Applications, 2024, 83 : 36287 - 36305
  • [5] Multiscale deep feature selection fusion network for referring image segmentation
    Dai, Xianwen
    Lin, Jiacheng
    Nai, Ke
    Li, Qingpeng
    Li, Zhiyong
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (12) : 36287 - 36305
  • [6] Cross-Modal Self-Attention Network for Referring Image Segmentation
    Ye, Linwei
    Rochan, Mrigank
    Liu, Zhi
    Wang, Yang
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10494 - 10503
  • [7] Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation
    Feng, Guang
    Hu, Zhiwei
    Zhang, Lihe
    Lu, Huchuan
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15501 - 15510
  • [8] Cascade Grouped Attention Network for Referring Expression Segmentation
    Luo, Gen
    Zhou, Yiyi
    Ji, Rongrong
    Sun, Xiaoshuai
    Su, Jinsong
    Lin, Chia-Wen
    Tian, Qi
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1274 - 1282
  • [9] Zero-shot Referring Image Segmentation with Global-Local Context Features
    Yu, Seonghoon
    Seo, Paul Hongsuck
    Son, Jeany
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19456 - 19465
  • [10] An anisotropic non-local attention network for image segmentation
    Yuan, Feiniu
    Zhu, Yaowen
    Li, Kang
    Fang, Zhijun
    Shi, Jinting
    [J]. MACHINE VISION AND APPLICATIONS, 2022, 33 (02)