Toward Complex-query Referring Image Segmentation: A Novel Benchmark

被引：0

作者：

Ji, Wei ^{[1
]}

Li, Li ^{[1
]}

Fei, Hao ^{[1
]}

Liu, Xiangyan ^{[1
]}

Yang, Xun ^{[2
]}

Li, Juncheng ^{[1
]}

Zimmermann, Roger ^{[1
]}

机构：

[1] Natl Univ Singapore, Singapore, Singapore

[2] Univ Sci & Technol China, Hefei, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2024年 / 21卷 / 01期

关键词：

Referring Image Understanding; Dual-modality; Graph Alignment; LANGUAGE;

D O I：

10.1145/3701733

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Referring Image Segmentation (RIS) has been extensively studied over the past decade, leading to the development of advanced algorithms. However, there has been a lack of research investigating how existing algorithms should be benchmarked with complex language queries, which include more informative descriptions of surrounding objects and backgrounds (e.g., the black car vs. the black car is parking on the road and beside the bus). Given the significant improvement in the semantic understanding capability of large pre-trained models, it is crucial to take a step further in RIS by incorporating complex language that resembles real-world applications. To close this gap, building upon the existing RefCOCO and Visual Genome datasets, we propose a new RIS benchmark with complex queries, namely RIS-CQ. The RIS-CQ dataset is of high quality and large scale, which challenges the existing RIS with enriched, specific, and informative queries, and enables a more realistic scenario of RIS research. Besides, we present a niche targeting method to better task the RIS-CQ, called Dual-Modality Graph Alignment (DuMoGA) model, which outperforms a series of RIS methods. To provide a valuable foundation for future advancements in the field of RIS with complex queries, we release the datasets, pre-processing and synthetic scripts, and the algorithm implementations at https://github.com/lili0415/DuMoGa.

引用

页数：18

共 50 条

[11] A BENCHMARK FOR SEMANTIC IMAGE SEGMENTATION
Li, Hui
Cai, Jianfei
Thi Nhat Anh Nguyen
Zheng, Jianmin
2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2013), 2013,
[12] Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval
Yang, Xun
Dong, Jianfeng
Cao, Yixin
Wang, Xun
Wang, Meng
Chua, Tat-Seng
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1339 - 1348
[13] iQPP: A Benchmark for Image Query Performance Prediction
Poesina, Eduard
Ionescu, Radu Tudor
Mothe, Josiane
PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 2953 - 2963
[14] RRSIS: Referring Remote Sensing Image Segmentation
Yuan, Zhenghang
Mou, Lichao
Hua, Yuansheng
Zhu, Xiao Xiang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 12
[15] Referring Image Segmentation Using Text Supervision
Liu, Fang
Liu, Yuhao
Kong, Yuqiu
Xu, Ke
Zhang, Lihe
Yin, Baocai
Hancke, Gerhard
Lau, Rynson
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22067 - 22077
[16] Image Segmentation With Language Referring Expression and Comprehension
Sun, Jiaxing
Li, Yujie
Cai, Jintong
Lu, Huimin
Serikawa, Seiichi
IEEE SENSORS JOURNAL, 2022, 22 (18) : 17406 - 17413
[17] Distillation and Supplementation of Features for Referring Image Segmentation
Tan, Zeyu
Xu, Dahong
Li, Xi
Liu, Hong
IEEE ACCESS, 2024, 12 : 171269 - 171279
[18] Recurrent Multimodal Interaction for Referring Image Segmentation
Liu, Chenxi
Lin, Zhe
Shen, Xiaohui
Yang, Jimei
Lu, Xin
Yuille, Alan
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1280 - 1289
[19] Contrastive Grouping with Transformer for Referring Image Segmentation
Tang, Jiajin
Zheng, Ge
Shi, Cheng
Yang, Sibei
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23570 - 23580
[20] Referring Image Segmentation Without Text Annotations
Liu, Jing
Jiang, Huajie
Bi, Yandong
Hu, Yongli
Yin, Baocai
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024, 2024, 14873 : 278 - 293

← 1 2 3 4 5 →