Vision-Aware Language Reasoning for Referring Image Segmentation

被引：0

作者：

Xu, Fayou ^{[1
]}

Luo, Bing ^{[1
]}

Zhang, Chao ^{[2
]}

Xu, Li ^{[3
]}

Pu, Mingxing ^{[1
]}

Li, Bo ^{[1
]}

机构：

[1] Xihua Univ, Sch Comp & Software Engn, Chengdu 610039, Peoples R China

[2] Sichuan Police Coll, Key Lab Intelligent Policing, Luzhou 646000, Peoples R China

[3] Xihua Univ, Sch Sci, Chengdu 610039, Peoples R China

来源：

NEURAL PROCESSING LETTERS | 2023年 / 55卷 / 08期

关键词：

Referring image segmentation; Vision and language; Explainable language-structure reasoning;

D O I：

10.1007/s11063-023-11377-z

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Referring image segmentation is a multimodal joint task that aims to segment linguistically indicated objects from images in paired expressions and images. However, the diversity of language annotations trends to result in semantic ambiguity, which makes the semantic representation of language feature encoding imprecise. Existing methods ignore the correction of language encoding module, so that the semantic error of language features cannot be improved in the subsequent process, resulting in semantic deviation. To this end, we propose a vision-aware language reasoning model. Intuitively, the segmentation result can be used to guide the reconstruction of language features, which could be expressed as a tree-structured recursive process. Specifically, we designed a language reasoning encoding module and a mask loopback optimization module to optimize the language encoding tree. The feature weights of tree nodes are learned through backpropagation. In order to overcome the problem that local language words and visual regions are easily introduced into noise regions in the traditional attention module, we use the global language prior information to calculate the importance of different words to further weight the visual region features, which could be embodied as language-aware vision attention module. Our experimental results on four benchmark datasets show that the proposed method achieves performance improvement.

引用

页码：11313 / 11331

页数：19

共 50 条

[21] Cross-modal attention guided visual reasoning for referring image segmentation
Wenjing Zhang
Mengnan Hu
Quange Tan
Qianli Zhou
Rong Wang
Multimedia Tools and Applications, 2023, 82 : 28853 - 28872
[22] Language-Aware Spatial-Temporal Collaboration for Referring Video Segmentation
Chinese Academy of Sciences, Institute of Information Engineering, Beijing
100045, China
不详
101408, China
不详
100191, China
不详
100191, China
不详
510275, China
不详
NSW
2007, Australia
不详
361008, China
IEEE Trans Pattern Anal Mach Intell, 1600, 7 (8646-8659):
[23] Language-Aware Spatial-Temporal Collaboration for Referring Video Segmentation
Hui, Tianrui
Liu, Si
Ding, Zihan
Huang, Shaofei
Li, Guanbin
Wang, Wenguan
Liu, Luoqi
Han, Jizhong
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (07) : 8646 - 8659
[24] SATR: Semantics-Aware Triadic Refinement network for referring image segmentation
Xie, Jialong
Liu, Jin
Wang, Guoxiang
Zhou, Fengyu
KNOWLEDGE-BASED SYSTEMS, 2024, 284
[25] Cross-modal transformer with language query for referring image segmentation
Zhang, Wenjing
Tan, Quange
Li, Pengxin
Zhang, Qi
Wang, Rong
NEUROCOMPUTING, 2023, 536 : 191 - 205
[26] Vision-aware target recognition toward autonomous robot by Kinect sensors
Chang, Qiuxiang
Xiong, Zhenkai
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 84 (84)
[27] Text Augmented Spatial-aware Zero-shot Referring Image Segmentation
Suo, Yucheng
Zhu, Linchao
Yang, Yi
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 1032 - 1043
[28] Mask prior generation with language queries guided networks for referring image segmentation
Zhou, Jinhao
Xiao, Guoqiang
Lew, Michael S.
Wu, Song
COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 253
[29] Vision-aware air-ground cooperative target localization for UAV and UGV
Liu, Daqian
Bao, Weidong
Zhu, Xiaomin
Fei, Bowen
Xiao, Zhenliang
Men, Tong
AEROSPACE SCIENCE AND TECHNOLOGY, 2022, 124
[30] Hierarchical collaboration for referring image segmentation
Zhang, Wei
Cheng, Zesen
Chen, Jie
Gao, Wen
NEUROCOMPUTING, 2025, 613

← 1 2 3 4 5 →