Vision-Aware Language Reasoning for Referring Image Segmentation

被引：0

作者：

Xu, Fayou ^{[1
]}

Luo, Bing ^{[1
]}

Zhang, Chao ^{[2
]}

Xu, Li ^{[3
]}

Pu, Mingxing ^{[1
]}

Li, Bo ^{[1
]}

机构：

[1] Xihua Univ, Sch Comp & Software Engn, Chengdu 610039, Peoples R China

[2] Sichuan Police Coll, Key Lab Intelligent Policing, Luzhou 646000, Peoples R China

[3] Xihua Univ, Sch Sci, Chengdu 610039, Peoples R China

来源：

NEURAL PROCESSING LETTERS | 2023年 / 55卷 / 08期

关键词：

Referring image segmentation; Vision and language; Explainable language-structure reasoning;

D O I：

10.1007/s11063-023-11377-z

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Referring image segmentation is a multimodal joint task that aims to segment linguistically indicated objects from images in paired expressions and images. However, the diversity of language annotations trends to result in semantic ambiguity, which makes the semantic representation of language feature encoding imprecise. Existing methods ignore the correction of language encoding module, so that the semantic error of language features cannot be improved in the subsequent process, resulting in semantic deviation. To this end, we propose a vision-aware language reasoning model. Intuitively, the segmentation result can be used to guide the reconstruction of language features, which could be expressed as a tree-structured recursive process. Specifically, we designed a language reasoning encoding module and a mask loopback optimization module to optimize the language encoding tree. The feature weights of tree nodes are learned through backpropagation. In order to overcome the problem that local language words and visual regions are easily introduced into noise regions in the traditional attention module, we use the global language prior information to calculate the importance of different words to further weight the visual region features, which could be embodied as language-aware vision attention module. Our experimental results on four benchmark datasets show that the proposed method achieves performance improvement.

引用

页码：11313 / 11331

页数：19

共 50 条

[31] Toward Robust Referring Image Segmentation
Wu, Jianzong
Li, Xiangtai
Li, Xia
Ding, Henghui
Tong, Yunhai
Tao, Dacheng
IEEE Transactions on Image Processing, 2024, 33 : 1782 - 1794
[32] Toward Robust Referring Image Segmentation
Wu, Jianzong
Li, Xiangtai
Li, Xia
Ding, Henghui
Tong, Yunhai
Tao, Dacheng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1782 - 1794
[33] Mask Grounding for Referring Image Segmentation
Chng, Yong Xien
Zheng, Henry
Han, Yizeng
Qiu, Xuchong
Huang, Gao
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26563 - 26573
[34] Token-word mixer meets object-aware transformer for referring image segmentation
Zhang, Zhenliang
Teng, Zhu
Fan, Jack
Zhang, Baopeng
Fan, Jianping
PATTERN RECOGNITION, 2024, 155
[35] LViT: Language Meets Vision Transformer in Medical Image Segmentation
Li, Zihan
Li, Yunxiang
Li, Qingde
Wang, Puyang
Guo, Dazhou
Lu, Le
Jin, Dakai
Zhang, You
Hong, Qingqi
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (01) : 96 - 107
[36] SMVT: Spectrum-Driven Multi-scale Vision Transformer for Referring Image Segmentation
Li, Tianxiao
Chen, Junhong
Huang, Yiheng
Huang, Kesi
Xia, Qiqiang
Asim, Muhammad
Liu, Wenyin
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VI, ICIC 2024, 2024, 14867 : 193 - 206
[37] Language as Queries for Referring Video Object Segmentation
Wu, Jiannan
Jiang, Yi
Sun, Peize
Yuan, Zehuan
Luo, Ping
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4964 - 4974
[38] Video Object Segmentation with Language Referring Expressions
Khoreva, Anna
Rohrbach, Anna
Schiele, Bernt
COMPUTER VISION - ACCV 2018, PT IV, 2019, 11364 : 123 - 141
[39] RRSIS: Referring Remote Sensing Image Segmentation
Yuan, Zhenghang
Mou, Lichao
Hua, Yuansheng
Zhu, Xiao Xiang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 12
[40] Referring Image Segmentation Using Text Supervision
Liu, Fang
Liu, Yuhao
Kong, Yuqiu
Xu, Ke
Zhang, Lihe
Yin, Baocai
Hancke, Gerhard
Lau, Rynson
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22067 - 22077

← 1 2 3 4 5 →