Dual Context Perception Transformer for Referring Image Segmentation

被引：0

作者：

Kong, Yuqiu ^{[1
]}

Liu, Junhua ^{[1
]}

Yao, Cuili ^{[1
]}

机构：

[1] Dalian Univ Technol, Dalian 116024, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024 | 2025年 / 15035卷

基金：

中国国家自然科学基金;

关键词：

Referring image segmentation; Vision-linguistic alignment; Multi-modal fusion;

D O I：

10.1007/978-981-97-8620-6_15

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Referring image segmentation segments target objects in the image according to language expressions. Existing methods mainly make efforts to integrate multi-modal features with attention mechanisms. However, most methods tend to incline to the feature of a single modal during the fusion stage and fall short in exploring cross-modal contextual information, which is critical in localizing accurate target regions. To this end, we propose a novel architecture named Dual Context Perception Transformer (DCPformer) which considers both visual and linguistic contextual information during the fusion and reasoning stages. Specifically, a Cross-modal Context-aware Perception Module (CCPM) is designed to model cross-modal alignment in a unified visual-linguistic representation space. Furthermore, we propose an Information Feedback Module (IFM) that generates a rectification mask based on deep-scale features and filters unrelated signals of the target object in features of shallower scales. Extensive experiments show that the proposed DCP-former achieves state-of-the-art performances against existing methods on three challenging benchmarks.

引用

页码：216 / 230

页数：15

共 50 条

[41] Image segmentation with context
Eriksson, Anders P.
Olsson, Carl
Kahl, Fredrik
IMAGE ANALYSIS, PROCEEDINGS, 2007, 4522 : 283 - +
[42] PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
Liu, Jiang
Ding, Hui
Cai, Zhaowei
Zhang, Yuting
Satzoda, Ravi Kumar
Mahadevan, Vijay
Manmatha, R.
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18653 - 18663
[43] CRIS: CLIP-Driven Referring Image Segmentation
Wang, Zhaoqing
Lu, Yu
Li, Qiang
Tao, Xunqiang
Guo, Yandong
Gong, Mingming
Liu, Tongliang
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11676 - 11685
[44] Attentive Excitation and Aggregation for Bilingual Referring Image Segmentation
Zhou, Qianli
Hui, Tianrui
Wang, Rong
Hu, Haimiao
Liu, Si
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2021, 12 (02)
[45] A survey of methods for addressing the challenges of referring image segmentation
Ji, Lixia
Du, Yunlong
Dang, Yiping
Gao, Wenzhao
Zhang, Han
NEUROCOMPUTING, 2024, 583
[46] Structured Multimodal Fusion Network for Referring Image Segmentation
Xue, Mingcheng
Liu, Yu
Xu, Kaiping
Zhang, Haiyang
Yu, Chengyang
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 36 - 47
[47] VLT: Vision-Language Transformer and Query Generation for Referring Segmentation
Ding, Henghui
Liu, Chang
Wang, Suchen
Jiang, Xudong
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7900 - 7916
[48] SAFARI: Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation
Nag, Sayan
Goswami, Koustava
Karanam, Srikrishna
COMPUTER VISION-ECCV 2024, PT XLIV, 2025, 15102 : 485 - 503
[49] Locate then Segment: A Strong Pipeline for Referring Image Segmentation
Jing, Ya
Kong, Tao
Wang, Wei
Wang, Liang
Li, Lei
Tan, Tieniu
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 9853 - 9862
[50] Expression Prompt Collaboration Transformer for universal referring video object segmentation
Chen, Jiajun
Lin, Jiacheng
Zhong, Guojin
Fu, Haolong
Nai, Ke
Yang, Kailun
Li, Zhiyong
KNOWLEDGE-BASED SYSTEMS, 2025, 311

← 1 2 3 4 5 →