Progressive Learning for Image Retrieval with Hybrid-Modality Queries

被引:16
|
作者
Zhao, Yida [1 ]
Song, Yuqing [1 ]
Jin, Qin [1 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Image Retrieval; Progressive Learning; Visual-Linguistic Query Composing; LANGUAGE; VISION;
D O I
10.1145/3477495.3532047
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image retrieval with hybrid-modality queries, also known as composing text and image for image retrieval (CTI-IR), is a retrieval task where the search intention is expressed in a more complex query format, involving both vision and text modalities. For example, a target product image is searched using a reference product image along with text about changing certain attributes of the reference image as the query. It is a more challenging image retrieval task that requires both semantic space learning and cross-modal fusion. Previous approaches that attempt to deal with both aspects achieve unsatisfactory performance. In this paper, we decompose the CTI-IR task into a three-stage learning problem to progressively learn the complex knowledge for image retrieval with hybrid-modality queries. We first leverage the semantic embedding space for open-domain image-text retrieval, and then transfer the learned knowledge to the fashion-domain with fashion-related pre-training tasks. Finally, we enhance the pre-trained model from single-query to hybrid-modality query for the CTI-IR task. Furthermore, as the contribution of individual modality in the hybrid-modality query varies for different retrieval scenarios, we propose a self-supervised adaptive weighting strategy to dynamically determine the importance of image and text in the hybrid-modality query for better retrieval. Extensive experiments show that our proposed model significantly outperforms state-of-the-art methods in the mean of Recall@K by 24.9% and 9.5% on the Fashion-IQ and Shoes benchmark datasets respectively.
引用
收藏
页码:1012 / 1021
页数:10
相关论文
共 50 条
  • [1] Linguistic Patterns and Cross Modality-based Image Retrieval for Complex Queries
    Chaudhary, Chandramani
    Goyal, Poonam
    Moniz, Joel Ruben Antony
    Goyal, Navneet
    Chen, Yi-Ping Phoebe
    ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 257 - 265
  • [2] Image retrieval by partial queries
    Grecu, H
    Lambert, P
    2001 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2001, : 26 - 29
  • [3] Image retrieval in multipoint queries
    Vu, Khanh
    Cheng, Hao
    Hua, Kien A.
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2008, 18 (2-3) : 170 - 181
  • [4] Combining Motor Imagery With Selective Sensation Toward a Hybrid-Modality BCI
    Yao, Lin
    Meng, Jianjun
    Zhang, Dingguo
    Sheng, Xinjun
    Zhu, Xiangyang
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2014, 61 (08) : 2304 - 2312
  • [5] Learning a hybrid similarity measure for image retrieval
    Wu, Jun
    Shen, Hong
    Li, Yi-Dong
    Xiao, Zhi-Bo
    Lu, Ming-Yu
    Wang, Chun-Li
    PATTERN RECOGNITION, 2013, 46 (11) : 2927 - 2939
  • [6] Joint Attribute Manipulation and Modality Alignment Learning for Composing Text and Image to Image Retrieval
    Zhang, Feifei
    Xu, Mingliang
    Mao, Qirong
    Xu, Changsheng
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3367 - 3376
  • [7] Image retrieval with local and spatial queries
    Moghaddam, B
    Biermann, H
    Margaritis, D
    2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL II, PROCEEDINGS, 2000, : 542 - 545
  • [8] IMAGE RETRIEVAL QUERIES IN DESCRIPTION LOGIC
    Mocanu, Irina
    Negreanu, Lorina
    ANNALS OF DAAAM FOR 2009 & PROCEEDINGS OF THE 20TH INTERNATIONAL DAAAM SYMPOSIUM, 2009, 20 : 333 - 334
  • [9] A Hybrid Approach to Content Based Image Retrieval Using Visual Features and Textual Queries
    Sudhakar, R.
    Krishnan, K. Raghesh
    Muthukrishnan, S.
    2011 THIRD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2011, : 241 - 247
  • [10] Hybrid-modality ocular imaging using a clinical ultrasound system and nanosecond pulsed laser
    Lim, Hoong-Ta
    Matham, Murukeshan Vadakke
    JOURNAL OF MEDICAL IMAGING, 2015, 2 (03)