Prompt-guided bidirectional deep fusion network for referring image segmentation

被引:0
|
作者
机构
[1] [1,Wu, Junxian
[2] Zhang, Yujia
[3] Kampffmeyer, Michael
[4] Zhao, Xiaoguang
关键词
Image segmentation;
D O I
10.1016/j.neucom.2024.128899
中图分类号
学科分类号
摘要
Referring image segmentation involves accurately segmenting objects based on natural language descriptions. This poses challenges due to the intricate and varied nature of language expressions, as well as the requirement to identify relevant image regions among multiple objects. Current models predominantly employ language-aware early fusion techniques, which may lead to misinterpretations of language expressions due to the lack of explicit visual guidance of the language encoder. Additionally, early fusion methods are unable to adequately leverage high-level contexts. To address these limitations, this paper introduces the Prompt-guided Bidirectional Deep Fusion Network (PBDF-Net) to enhance the fusion of language and vision modalities. In contrast to traditional unidirectional early fusion approaches, our approach employs a prompt-guided bidirectional encoder fusion (PBEF) module to promote mutual cross-modal fusion across multiple stages of the vision and language encoders. Furthermore, PBDF-Net incorporates a prompt-guided cross-modal interaction (PCI) module during the late fusion stage, facilitating a more profound integration of contextual information from both modalities, resulting in more accurate target segmentation. Comprehensive experiments conducted on the RefCOCO, RefCOCO+, G-Ref and ReferIt datasets substantiate the efficacy of our proposed method, demonstrating significant advancements in performance compared to existing approaches. © 2024 Elsevier B.V.
引用
下载
收藏
相关论文
共 50 条
  • [41] Bi-directional Relationship Inferring Network for Referring Image Segmentation
    Hu, Zhiwei
    Feng, Guang
    Sun, Jiayu
    Zhang, Lihe
    Lu, Huchuan
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4423 - 4432
  • [42] Key-Word-Aware Network for Referring Expression Image Segmentation
    Shi, Hengcan
    Li, Hongliang
    Meng, Fanman
    Wu, Qingbo
    COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 : 38 - 54
  • [43] Semantic Segmentation Guided Pixel Fusion for Image Retargeting
    Yan, Bo
    Niu, Xuejing
    Bare, Bahetiyaer
    Tan, Weimin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (03) : 676 - 687
  • [44] Guided Filter Network for Semantic Image Segmentation
    Zhang, Xiang
    Zhao, Wanqing
    Zhang, Wei
    Peng, Jinye
    Fan, Jianping
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 2695 - 2709
  • [45] Attention Guided Network for Retinal Image Segmentation
    Zhang, Shihao
    Fu, Huazhu
    Yan, Yuguang
    Zhang, Yubing
    Wu, Qingyao
    Yang, Ming
    Tan, Mingkui
    Xu, Yanwu
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT I, 2019, 11764 : 797 - 805
  • [46] Dense feature pyramid fusion deep network for building segmentation in remote sensing image
    Tian Qinglin
    Zhao Yingjun
    Qin Kai
    Li Yao
    Chen Xuejiao
    SEVENTH SYMPOSIUM ON NOVEL PHOTOELECTRONIC DETECTION TECHNOLOGY AND APPLICATIONS, 2021, 11763
  • [47] Attention-Based Deep Fusion Network for Retinal Lesion Segmentation in Fundus Image
    Dayana, A. Mary
    Emmanuel, W. R. Sam
    ADVANCES IN COMPUTING AND DATA SCIENCES, PT I, 2021, 1440 : 401 - 409
  • [48] Image Semantic Segmentation Method Based on Deep Fusion Network and Conditional Random Field
    Wang, Shuo
    Yang, Yi
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [49] Lightweight medical image segmentation network with multi-scale feature-guided fusion
    Zhu, Zhiqin
    Yu, Kun
    Qi, Guanqiu
    Cong, Baisen
    Li, Yuanyuan
    Li, Zexin
    Gao, Xinbo
    Computers in Biology and Medicine, 2024, 182
  • [50] Collaborative Attention Guided Multi-Scale Feature Fusion Network for Medical Image Segmentation
    Xu, Zhenghua
    Tian, Biao
    Liu, Shijie
    Wang, Xiangtao
    Yuan, Di
    Gu, Junhua
    Chen, Junyang
    Lukasiewicz, Thomas
    Leung, Victor C. M.
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2024, 11 (02): : 1857 - 1871