Vision-Aware Language Reasoning for Referring Image Segmentation

被引:0
|
作者
Xu, Fayou [1 ]
Luo, Bing [1 ]
Zhang, Chao [2 ]
Xu, Li [3 ]
Pu, Mingxing [1 ]
Li, Bo [1 ]
机构
[1] Xihua Univ, Sch Comp & Software Engn, Chengdu 610039, Peoples R China
[2] Sichuan Police Coll, Key Lab Intelligent Policing, Luzhou 646000, Peoples R China
[3] Xihua Univ, Sch Sci, Chengdu 610039, Peoples R China
关键词
Referring image segmentation; Vision and language; Explainable language-structure reasoning;
D O I
10.1007/s11063-023-11377-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring image segmentation is a multimodal joint task that aims to segment linguistically indicated objects from images in paired expressions and images. However, the diversity of language annotations trends to result in semantic ambiguity, which makes the semantic representation of language feature encoding imprecise. Existing methods ignore the correction of language encoding module, so that the semantic error of language features cannot be improved in the subsequent process, resulting in semantic deviation. To this end, we propose a vision-aware language reasoning model. Intuitively, the segmentation result can be used to guide the reconstruction of language features, which could be expressed as a tree-structured recursive process. Specifically, we designed a language reasoning encoding module and a mask loopback optimization module to optimize the language encoding tree. The feature weights of tree nodes are learned through backpropagation. In order to overcome the problem that local language words and visual regions are easily introduced into noise regions in the traditional attention module, we use the global language prior information to calculate the importance of different words to further weight the visual region features, which could be embodied as language-aware vision attention module. Our experimental results on four benchmark datasets show that the proposed method achieves performance improvement.
引用
收藏
页码:11313 / 11331
页数:19
相关论文
共 50 条
  • [21] Cross-modal attention guided visual reasoning for referring image segmentation
    Wenjing Zhang
    Mengnan Hu
    Quange Tan
    Qianli Zhou
    Rong Wang
    Multimedia Tools and Applications, 2023, 82 : 28853 - 28872
  • [22] Language-Aware Spatial-Temporal Collaboration for Referring Video Segmentation
    Chinese Academy of Sciences, Institute of Information Engineering, Beijing
    100045, China
    不详
    101408, China
    不详
    100191, China
    不详
    100191, China
    不详
    510275, China
    不详
    NSW
    2007, Australia
    不详
    361008, China
    IEEE Trans Pattern Anal Mach Intell, 1600, 7 (8646-8659):
  • [23] Language-Aware Spatial-Temporal Collaboration for Referring Video Segmentation
    Hui, Tianrui
    Liu, Si
    Ding, Zihan
    Huang, Shaofei
    Li, Guanbin
    Wang, Wenguan
    Liu, Luoqi
    Han, Jizhong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (07) : 8646 - 8659
  • [24] SATR: Semantics-Aware Triadic Refinement network for referring image segmentation
    Xie, Jialong
    Liu, Jin
    Wang, Guoxiang
    Zhou, Fengyu
    KNOWLEDGE-BASED SYSTEMS, 2024, 284
  • [25] Cross-modal transformer with language query for referring image segmentation
    Zhang, Wenjing
    Tan, Quange
    Li, Pengxin
    Zhang, Qi
    Wang, Rong
    NEUROCOMPUTING, 2023, 536 : 191 - 205
  • [26] Vision-aware target recognition toward autonomous robot by Kinect sensors
    Chang, Qiuxiang
    Xiong, Zhenkai
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 84 (84)
  • [27] Text Augmented Spatial-aware Zero-shot Referring Image Segmentation
    Suo, Yucheng
    Zhu, Linchao
    Yang, Yi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 1032 - 1043
  • [28] Mask prior generation with language queries guided networks for referring image segmentation
    Zhou, Jinhao
    Xiao, Guoqiang
    Lew, Michael S.
    Wu, Song
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 253
  • [29] Vision-aware air-ground cooperative target localization for UAV and UGV
    Liu, Daqian
    Bao, Weidong
    Zhu, Xiaomin
    Fei, Bowen
    Xiao, Zhenliang
    Men, Tong
    AEROSPACE SCIENCE AND TECHNOLOGY, 2022, 124
  • [30] Hierarchical collaboration for referring image segmentation
    Zhang, Wei
    Cheng, Zesen
    Chen, Jie
    Gao, Wen
    NEUROCOMPUTING, 2025, 613