Prompt-guided bidirectional deep fusion network for referring image segmentation

被引:0
|
作者
机构
[1] [1,Wu, Junxian
[2] Zhang, Yujia
[3] Kampffmeyer, Michael
[4] Zhao, Xiaoguang
关键词
Image segmentation;
D O I
10.1016/j.neucom.2024.128899
中图分类号
学科分类号
摘要
Referring image segmentation involves accurately segmenting objects based on natural language descriptions. This poses challenges due to the intricate and varied nature of language expressions, as well as the requirement to identify relevant image regions among multiple objects. Current models predominantly employ language-aware early fusion techniques, which may lead to misinterpretations of language expressions due to the lack of explicit visual guidance of the language encoder. Additionally, early fusion methods are unable to adequately leverage high-level contexts. To address these limitations, this paper introduces the Prompt-guided Bidirectional Deep Fusion Network (PBDF-Net) to enhance the fusion of language and vision modalities. In contrast to traditional unidirectional early fusion approaches, our approach employs a prompt-guided bidirectional encoder fusion (PBEF) module to promote mutual cross-modal fusion across multiple stages of the vision and language encoders. Furthermore, PBDF-Net incorporates a prompt-guided cross-modal interaction (PCI) module during the late fusion stage, facilitating a more profound integration of contextual information from both modalities, resulting in more accurate target segmentation. Comprehensive experiments conducted on the RefCOCO, RefCOCO+, G-Ref and ReferIt datasets substantiate the efficacy of our proposed method, demonstrating significant advancements in performance compared to existing approaches. © 2024 Elsevier B.V.
引用
收藏
相关论文
共 50 条
  • [1] Multiscale deep feature selection fusion network for referring image segmentation
    Xianwen Dai
    Jiacheng Lin
    Ke Nai
    Qingpeng Li
    Zhiyong Li
    [J]. Multimedia Tools and Applications, 2024, 83 : 36287 - 36305
  • [2] Multiscale deep feature selection fusion network for referring image segmentation
    Dai, Xianwen
    Lin, Jiacheng
    Nai, Ke
    Li, Qingpeng
    Li, Zhiyong
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (12) : 36287 - 36305
  • [3] Bidirectional Relationship Inferring Network for Referring Image Localization and Segmentation
    Feng, Guang
    Hu, Zhiwei
    Zhang, Lihe
    Sun, Jiayu
    Lu, Huchuan
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (05) : 2246 - 2258
  • [4] Structured Multimodal Fusion Network for Referring Image Segmentation
    Xue, Mingcheng
    Liu, Yu
    Xu, Kaiping
    Zhang, Haiyang
    Yu, Chengyang
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 36 - 47
  • [5] PROMPTCAP: Prompt-Guided Image Captioning for VQA with GPT-3
    Hu, Yushi
    Hua, Hang
    Yang, Zhengyuan
    Shi, Weijia
    Smith, Noah A.
    Luo, Jiebo
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2951 - 2963
  • [6] Prompt-Guided Sparse Transformer for Remote Sensing Image Dehazing
    Dong, Haobo
    Song, Tianyu
    Qi, Xuanyu
    Jin, Guiyue
    Jin, Jiyu
    Ma, Ling
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
  • [7] Prompt-guided image color aesthetics assessment: Models, datasets and benchmarks
    He, Shuai
    Xiao, Yi
    Ming, Anlong
    Ma, Huadong
    [J]. Information Fusion, 2025, 114
  • [8] MVPN: Multi-granularity visual prompt-guided fusion network for multimodal named entity recognition
    Liu, Wei
    Ren, Aiqun
    Wang, Chao
    Peng, Yan
    Xie, Shaorong
    Li, Weimin
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (28) : 71639 - 71663
  • [9] Low-Rank Prompt-Guided Transformer for Hyperspectral Image Denoising
    Tan, Xiaodong
    Shao, Mingwen
    Qiao, Yuanjian
    Liu, Tiyao
    Cao, Xiangyong
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [10] Referring image segmentation with attention guided cross modal fusion for semantic oriented languages
    Qianli Zhou
    Rong Wang
    Haimiao Hu
    Quange Tan
    Wenjin Zhang
    [J]. Frontiers of Computer Science, 2022, 16