Prompt-guided bidirectional deep fusion network for referring image segmentation

被引：0

作者：

机构：

[1] [1,Wu, Junxian

[2] Zhang, Yujia

[3] Kampffmeyer, Michael

[4] Zhao, Xiaoguang

来源：

Zhang, Yujia (zhangyujia2014@ia.ac.cn) | 2025年 / 616卷

关键词：

Image segmentation;

D O I：

10.1016/j.neucom.2024.128899

中图分类号：

学科分类号：

摘要：

Referring image segmentation involves accurately segmenting objects based on natural language descriptions. This poses challenges due to the intricate and varied nature of language expressions, as well as the requirement to identify relevant image regions among multiple objects. Current models predominantly employ language-aware early fusion techniques, which may lead to misinterpretations of language expressions due to the lack of explicit visual guidance of the language encoder. Additionally, early fusion methods are unable to adequately leverage high-level contexts. To address these limitations, this paper introduces the Prompt-guided Bidirectional Deep Fusion Network (PBDF-Net) to enhance the fusion of language and vision modalities. In contrast to traditional unidirectional early fusion approaches, our approach employs a prompt-guided bidirectional encoder fusion (PBEF) module to promote mutual cross-modal fusion across multiple stages of the vision and language encoders. Furthermore, PBDF-Net incorporates a prompt-guided cross-modal interaction (PCI) module during the late fusion stage, facilitating a more profound integration of contextual information from both modalities, resulting in more accurate target segmentation. Comprehensive experiments conducted on the RefCOCO, RefCOCO+, G-Ref and ReferIt datasets substantiate the efficacy of our proposed method, demonstrating significant advancements in performance compared to existing approaches. © 2024 Elsevier B.V.

引用

下载

共 50 条

[21] Bilateral Knowledge Interaction Network for Referring Image Segmentation
Ding, Haixin
Zhang, Shengchuan
Wu, Qiong
Yu, Songlin
Hu, Jie
Cao, Liujuan
Ji, Rongrong
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2966 - 2977
[22] BiFNet: Bidirectional Fusion Network for Road Segmentation
Li, Haoran
Chen, Yaran
Zhang, Qichao
Zhao, Dongbin
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (09) : 8617 - 8628
[23] Saliency guided deep network for weakly-supervised image segmentation
Sun, Fengdong
Li, Wenhui
PATTERN RECOGNITION LETTERS, 2019, 120 : 62 - 68
[24] Prompt-Guided Zero-Shot Anomaly Action Recognition using Pretrained Deep Skeleton Features
Sato, Fumiaki
Hachiuma, Ryo
Sekii, Taiki
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6471 - 6480
[25] Text-Guided Image Manipulation via Generative Adversarial Network With Referring Image Segmentation-Based Guidance
Watanabe, Yuto
Togo, Ren
Maeda, Keisuke
Ogawa, Takahiro
Haseyama, Miki
IEEE ACCESS, 2023, 11 : 42534 - 42545
[26] Calibration & Reconstruction: Deep Integrated Language for Referring Image Segmentation
Yan, Yichen
He, Xingjian
Chen, Sihan
Liu, Jing
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 451 - 459
[27] Optimizing waste handling with interactive AI: Prompt-guided segmentation of construction and demolition waste using computer vision
Sirimewan, Diani
Kunananthaseelan, Nilakshan
Raman, Sudharshan
Garcia, Reyes
Arashpour, Mehrdad
WASTE MANAGEMENT, 2024, 190 : 149 - 160
[28] DEEP PRIOR GUIDED NETWORK FOR HIGH-QUALITY IMAGE FUSION
Yin, Jia-Li
Chen, Bo-Hao
Peng, Yan-Tsung
Tsai, Chung-Chi
2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
[29] Global and Local Interactive Perception Network for Referring Image Segmentation
Liu, Jing
Tan, Hongchen
Hu, Yongli
Sun, Yanfeng
Wang, Huasheng
Yin, Baocai
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (12) : 1 - 14
[30] Global Selection and Local Attention Network for Referring Image Segmentation
Ding, Haixin
Zhang, Shengchuan
Cao, Liujuan
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII, 2024, 14431 : 284 - 295

← 1 2 3 4 5 →