Toward Complex-query Referring Image Segmentation: A Novel Benchmark

被引：0

作者：

Ji, Wei ^{[1
]}

Li, Li ^{[1
]}

Fei, Hao ^{[1
]}

Liu, Xiangyan ^{[1
]}

Yang, Xun ^{[2
]}

Li, Juncheng ^{[1
]}

Zimmermann, Roger ^{[1
]}

机构：

[1] Natl Univ Singapore, Singapore, Singapore

[2] Univ Sci & Technol China, Hefei, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2024年 / 21卷 / 01期

关键词：

Referring Image Understanding; Dual-modality; Graph Alignment; LANGUAGE;

D O I：

10.1145/3701733

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Referring Image Segmentation (RIS) has been extensively studied over the past decade, leading to the development of advanced algorithms. However, there has been a lack of research investigating how existing algorithms should be benchmarked with complex language queries, which include more informative descriptions of surrounding objects and backgrounds (e.g., the black car vs. the black car is parking on the road and beside the bus). Given the significant improvement in the semantic understanding capability of large pre-trained models, it is crucial to take a step further in RIS by incorporating complex language that resembles real-world applications. To close this gap, building upon the existing RefCOCO and Visual Genome datasets, we propose a new RIS benchmark with complex queries, namely RIS-CQ. The RIS-CQ dataset is of high quality and large scale, which challenges the existing RIS with enriched, specific, and informative queries, and enables a more realistic scenario of RIS research. Besides, we present a niche targeting method to better task the RIS-CQ, called Dual-Modality Graph Alignment (DuMoGA) model, which outperforms a series of RIS methods. To provide a valuable foundation for future advancements in the field of RIS with complex queries, we release the datasets, pre-processing and synthetic scripts, and the algorithm implementations at https://github.com/lili0415/DuMoGa.

引用

页数：18

共 50 条

[1] Toward Robust Referring Image Segmentation
Wu, Jianzong
Li, Xiangtai
Li, Xia
Ding, Henghui
Tong, Yunhai
Tao, Dacheng
IEEE Transactions on Image Processing, 2024, 33 : 1782 - 1794
[2] Toward Robust Referring Image Segmentation
Wu, Jianzong
Li, Xiangtai
Li, Xia
Ding, Henghui
Tong, Yunhai
Tao, Dacheng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1782 - 1794
[3] Query Reconstruction Network for Referring Expression Image Segmentation
Shi, Hengcan
Li, Hongliang
Wu, Qingbo
Ngan, King Ngi
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 995 - 1007
[4] Complex-query web image search with concept-based relevance estimation
Guo, Dan
Gao, Pengfei
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2016, 19 (02): : 247 - 264
[5] Complex-query web image search with concept-based relevance estimation
Dan Guo
Pengfei Gao
World Wide Web, 2016, 19 : 247 - 264
[6] Cross-modal transformer with language query for referring image segmentation
Zhang, Wenjing
Tan, Quange
Li, Pengxin
Zhang, Qi
Wang, Rong
NEUROCOMPUTING, 2023, 536 : 191 - 205
[7] Toward an image segmentation benchmark for evaluation of vision systems
Pissaloux, EE
JOURNAL OF ELECTRONIC IMAGING, 2001, 10 (01) : 203 - 212
[8] Hierarchical collaboration for referring image segmentation
Zhang, Wei
Cheng, Zesen
Chen, Jie
Gao, Wen
NEUROCOMPUTING, 2025, 613
[9] Mask Grounding for Referring Image Segmentation
Chng, Yong Xien
Zheng, Henry
Han, Yizeng
Qiu, Xuchong
Huang, Gao
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26563 - 26573
[10] Vision-Language Transformer and Query Generation for Referring Segmentation
Ding, Henghui
Liu, Chang
Wang, Suchen
Jiang, Xudong
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 16301 - 16310

← 1 2 3 4 5 →