Dual Context Perception Transformer for Referring Image Segmentation

被引:0
|
作者
Kong, Yuqiu [1 ]
Liu, Junhua [1 ]
Yao, Cuili [1 ]
机构
[1] Dalian Univ Technol, Dalian 116024, Peoples R China
来源
PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024 | 2025年 / 15035卷
基金
中国国家自然科学基金;
关键词
Referring image segmentation; Vision-linguistic alignment; Multi-modal fusion;
D O I
10.1007/978-981-97-8620-6_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring image segmentation segments target objects in the image according to language expressions. Existing methods mainly make efforts to integrate multi-modal features with attention mechanisms. However, most methods tend to incline to the feature of a single modal during the fusion stage and fall short in exploring cross-modal contextual information, which is critical in localizing accurate target regions. To this end, we propose a novel architecture named Dual Context Perception Transformer (DCPformer) which considers both visual and linguistic contextual information during the fusion and reasoning stages. Specifically, a Cross-modal Context-aware Perception Module (CCPM) is designed to model cross-modal alignment in a unified visual-linguistic representation space. Furthermore, we propose an Information Feedback Module (IFM) that generates a rectification mask based on deep-scale features and filters unrelated signals of the target object in features of shallower scales. Extensive experiments show that the proposed DCP-former achieves state-of-the-art performances against existing methods on three challenging benchmarks.
引用
收藏
页码:216 / 230
页数:15
相关论文
共 50 条
  • [41] Image segmentation with context
    Eriksson, Anders P.
    Olsson, Carl
    Kahl, Fredrik
    IMAGE ANALYSIS, PROCEEDINGS, 2007, 4522 : 283 - +
  • [42] PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
    Liu, Jiang
    Ding, Hui
    Cai, Zhaowei
    Zhang, Yuting
    Satzoda, Ravi Kumar
    Mahadevan, Vijay
    Manmatha, R.
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18653 - 18663
  • [43] CRIS: CLIP-Driven Referring Image Segmentation
    Wang, Zhaoqing
    Lu, Yu
    Li, Qiang
    Tao, Xunqiang
    Guo, Yandong
    Gong, Mingming
    Liu, Tongliang
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11676 - 11685
  • [44] Attentive Excitation and Aggregation for Bilingual Referring Image Segmentation
    Zhou, Qianli
    Hui, Tianrui
    Wang, Rong
    Hu, Haimiao
    Liu, Si
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2021, 12 (02)
  • [45] A survey of methods for addressing the challenges of referring image segmentation
    Ji, Lixia
    Du, Yunlong
    Dang, Yiping
    Gao, Wenzhao
    Zhang, Han
    NEUROCOMPUTING, 2024, 583
  • [46] Structured Multimodal Fusion Network for Referring Image Segmentation
    Xue, Mingcheng
    Liu, Yu
    Xu, Kaiping
    Zhang, Haiyang
    Yu, Chengyang
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 36 - 47
  • [47] VLT: Vision-Language Transformer and Query Generation for Referring Segmentation
    Ding, Henghui
    Liu, Chang
    Wang, Suchen
    Jiang, Xudong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7900 - 7916
  • [48] SAFARI: Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation
    Nag, Sayan
    Goswami, Koustava
    Karanam, Srikrishna
    COMPUTER VISION-ECCV 2024, PT XLIV, 2025, 15102 : 485 - 503
  • [49] Locate then Segment: A Strong Pipeline for Referring Image Segmentation
    Jing, Ya
    Kong, Tao
    Wang, Wei
    Wang, Liang
    Li, Lei
    Tan, Tieniu
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 9853 - 9862
  • [50] Expression Prompt Collaboration Transformer for universal referring video object segmentation
    Chen, Jiajun
    Lin, Jiacheng
    Zhong, Guojin
    Fu, Haolong
    Nai, Ke
    Yang, Kailun
    Li, Zhiyong
    KNOWLEDGE-BASED SYSTEMS, 2025, 311