Toward Complex-query Referring Image Segmentation: A Novel Benchmark

被引:0
|
作者
Ji, Wei [1 ]
Li, Li [1 ]
Fei, Hao [1 ]
Liu, Xiangyan [1 ]
Yang, Xun [2 ]
Li, Juncheng [1 ]
Zimmermann, Roger [1 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
[2] Univ Sci & Technol China, Hefei, Peoples R China
关键词
Referring Image Understanding; Dual-modality; Graph Alignment; LANGUAGE;
D O I
10.1145/3701733
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Referring Image Segmentation (RIS) has been extensively studied over the past decade, leading to the development of advanced algorithms. However, there has been a lack of research investigating how existing algorithms should be benchmarked with complex language queries, which include more informative descriptions of surrounding objects and backgrounds (e.g., the black car vs. the black car is parking on the road and beside the bus). Given the significant improvement in the semantic understanding capability of large pre-trained models, it is crucial to take a step further in RIS by incorporating complex language that resembles real-world applications. To close this gap, building upon the existing RefCOCO and Visual Genome datasets, we propose a new RIS benchmark with complex queries, namely RIS-CQ. The RIS-CQ dataset is of high quality and large scale, which challenges the existing RIS with enriched, specific, and informative queries, and enables a more realistic scenario of RIS research. Besides, we present a niche targeting method to better task the RIS-CQ, called Dual-Modality Graph Alignment (DuMoGA) model, which outperforms a series of RIS methods. To provide a valuable foundation for future advancements in the field of RIS with complex queries, we release the datasets, pre-processing and synthetic scripts, and the algorithm implementations at https://github.com/lili0415/DuMoGa.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Toward Robust Referring Image Segmentation
    Wu, Jianzong
    Li, Xiangtai
    Li, Xia
    Ding, Henghui
    Tong, Yunhai
    Tao, Dacheng
    IEEE Transactions on Image Processing, 2024, 33 : 1782 - 1794
  • [2] Toward Robust Referring Image Segmentation
    Wu, Jianzong
    Li, Xiangtai
    Li, Xia
    Ding, Henghui
    Tong, Yunhai
    Tao, Dacheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1782 - 1794
  • [3] Query Reconstruction Network for Referring Expression Image Segmentation
    Shi, Hengcan
    Li, Hongliang
    Wu, Qingbo
    Ngan, King Ngi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 995 - 1007
  • [4] Complex-query web image search with concept-based relevance estimation
    Guo, Dan
    Gao, Pengfei
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2016, 19 (02): : 247 - 264
  • [5] Complex-query web image search with concept-based relevance estimation
    Dan Guo
    Pengfei Gao
    World Wide Web, 2016, 19 : 247 - 264
  • [6] Cross-modal transformer with language query for referring image segmentation
    Zhang, Wenjing
    Tan, Quange
    Li, Pengxin
    Zhang, Qi
    Wang, Rong
    NEUROCOMPUTING, 2023, 536 : 191 - 205
  • [7] Toward an image segmentation benchmark for evaluation of vision systems
    Pissaloux, EE
    JOURNAL OF ELECTRONIC IMAGING, 2001, 10 (01) : 203 - 212
  • [8] Hierarchical collaboration for referring image segmentation
    Zhang, Wei
    Cheng, Zesen
    Chen, Jie
    Gao, Wen
    NEUROCOMPUTING, 2025, 613
  • [9] Mask Grounding for Referring Image Segmentation
    Chng, Yong Xien
    Zheng, Henry
    Han, Yizeng
    Qiu, Xuchong
    Huang, Gao
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26563 - 26573
  • [10] Vision-Language Transformer and Query Generation for Referring Segmentation
    Ding, Henghui
    Liu, Chang
    Wang, Suchen
    Jiang, Xudong
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 16301 - 16310