Prompt-guided bidirectional deep fusion network for referring image segmentation

被引:0
|
作者
机构
[1] [1,Wu, Junxian
[2] Zhang, Yujia
[3] Kampffmeyer, Michael
[4] Zhao, Xiaoguang
关键词
Image segmentation;
D O I
10.1016/j.neucom.2024.128899
中图分类号
学科分类号
摘要
Referring image segmentation involves accurately segmenting objects based on natural language descriptions. This poses challenges due to the intricate and varied nature of language expressions, as well as the requirement to identify relevant image regions among multiple objects. Current models predominantly employ language-aware early fusion techniques, which may lead to misinterpretations of language expressions due to the lack of explicit visual guidance of the language encoder. Additionally, early fusion methods are unable to adequately leverage high-level contexts. To address these limitations, this paper introduces the Prompt-guided Bidirectional Deep Fusion Network (PBDF-Net) to enhance the fusion of language and vision modalities. In contrast to traditional unidirectional early fusion approaches, our approach employs a prompt-guided bidirectional encoder fusion (PBEF) module to promote mutual cross-modal fusion across multiple stages of the vision and language encoders. Furthermore, PBDF-Net incorporates a prompt-guided cross-modal interaction (PCI) module during the late fusion stage, facilitating a more profound integration of contextual information from both modalities, resulting in more accurate target segmentation. Comprehensive experiments conducted on the RefCOCO, RefCOCO+, G-Ref and ReferIt datasets substantiate the efficacy of our proposed method, demonstrating significant advancements in performance compared to existing approaches. © 2024 Elsevier B.V.
引用
下载
收藏
相关论文
共 50 条
  • [21] Bilateral Knowledge Interaction Network for Referring Image Segmentation
    Ding, Haixin
    Zhang, Shengchuan
    Wu, Qiong
    Yu, Songlin
    Hu, Jie
    Cao, Liujuan
    Ji, Rongrong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2966 - 2977
  • [22] BiFNet: Bidirectional Fusion Network for Road Segmentation
    Li, Haoran
    Chen, Yaran
    Zhang, Qichao
    Zhao, Dongbin
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (09) : 8617 - 8628
  • [23] Saliency guided deep network for weakly-supervised image segmentation
    Sun, Fengdong
    Li, Wenhui
    PATTERN RECOGNITION LETTERS, 2019, 120 : 62 - 68
  • [24] Prompt-Guided Zero-Shot Anomaly Action Recognition using Pretrained Deep Skeleton Features
    Sato, Fumiaki
    Hachiuma, Ryo
    Sekii, Taiki
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6471 - 6480
  • [25] Text-Guided Image Manipulation via Generative Adversarial Network With Referring Image Segmentation-Based Guidance
    Watanabe, Yuto
    Togo, Ren
    Maeda, Keisuke
    Ogawa, Takahiro
    Haseyama, Miki
    IEEE ACCESS, 2023, 11 : 42534 - 42545
  • [26] Calibration & Reconstruction: Deep Integrated Language for Referring Image Segmentation
    Yan, Yichen
    He, Xingjian
    Chen, Sihan
    Liu, Jing
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 451 - 459
  • [27] Optimizing waste handling with interactive AI: Prompt-guided segmentation of construction and demolition waste using computer vision
    Sirimewan, Diani
    Kunananthaseelan, Nilakshan
    Raman, Sudharshan
    Garcia, Reyes
    Arashpour, Mehrdad
    WASTE MANAGEMENT, 2024, 190 : 149 - 160
  • [28] DEEP PRIOR GUIDED NETWORK FOR HIGH-QUALITY IMAGE FUSION
    Yin, Jia-Li
    Chen, Bo-Hao
    Peng, Yan-Tsung
    Tsai, Chung-Chi
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [29] Global and Local Interactive Perception Network for Referring Image Segmentation
    Liu, Jing
    Tan, Hongchen
    Hu, Yongli
    Sun, Yanfeng
    Wang, Huasheng
    Yin, Baocai
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (12) : 1 - 14
  • [30] Global Selection and Local Attention Network for Referring Image Segmentation
    Ding, Haixin
    Zhang, Shengchuan
    Cao, Liujuan
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII, 2024, 14431 : 284 - 295