Progressive Learning for Image Retrieval with Hybrid-Modality Queries

被引:16
|
作者
Zhao, Yida [1 ]
Song, Yuqing [1 ]
Jin, Qin [1 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Image Retrieval; Progressive Learning; Visual-Linguistic Query Composing; LANGUAGE; VISION;
D O I
10.1145/3477495.3532047
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image retrieval with hybrid-modality queries, also known as composing text and image for image retrieval (CTI-IR), is a retrieval task where the search intention is expressed in a more complex query format, involving both vision and text modalities. For example, a target product image is searched using a reference product image along with text about changing certain attributes of the reference image as the query. It is a more challenging image retrieval task that requires both semantic space learning and cross-modal fusion. Previous approaches that attempt to deal with both aspects achieve unsatisfactory performance. In this paper, we decompose the CTI-IR task into a three-stage learning problem to progressively learn the complex knowledge for image retrieval with hybrid-modality queries. We first leverage the semantic embedding space for open-domain image-text retrieval, and then transfer the learned knowledge to the fashion-domain with fashion-related pre-training tasks. Finally, we enhance the pre-trained model from single-query to hybrid-modality query for the CTI-IR task. Furthermore, as the contribution of individual modality in the hybrid-modality query varies for different retrieval scenarios, we propose a self-supervised adaptive weighting strategy to dynamically determine the importance of image and text in the hybrid-modality query for better retrieval. Extensive experiments show that our proposed model significantly outperforms state-of-the-art methods in the mean of Recall@K by 24.9% and 9.5% on the Fashion-IQ and Shoes benchmark datasets respectively.
引用
收藏
页码:1012 / 1021
页数:10
相关论文
共 50 条
  • [41] Ranking and Retrieval of Image Sequences from Multiple Paragraph Queries
    Kim, Gunhee
    Moon, Seungwhan
    Sigal, Leonid
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 1993 - 2001
  • [42] EXACT AND PROGRESSIVE IMAGE RETRIEVAL WITH THE HiPeR FRAMEWORK
    Bouteldja, Nouha
    Gouet-Brunet, Valerie
    2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 1257 - 1260
  • [43] STRICT: An image retrieval platform for queries based on regional content
    Omhover, JR
    Detyniecki, M
    IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2004, 3115 : 473 - 482
  • [44] A Probabilistic Approach for Image Retrieval Using Descriptive Textual Queries
    Verma, Yashaswi
    Jawahar, C. V.
    MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, : 1091 - 1094
  • [45] Context-sensitive queries for image retrieval in digital libraries
    G. Boccignone
    A. Chianese
    V. Moscato
    A. Picariello
    Journal of Intelligent Information Systems, 2008, 31 : 53 - 84
  • [46] Semantic image retrieval for complex queries using a knowledge parser
    Chen, Hua
    Trouve, Antoine
    Murakami, Kazuaki J.
    Fukuda, Akira
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (09) : 10733 - 10751
  • [47] Large-Scale Video Retrieval Using Image Queries
    Araujo, Andre
    Girod, Bernd
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (06) : 1406 - 1420
  • [48] Dynamic Queries with Relevance Feedback for Content Based Image Retrieval
    Birinci, Murat
    Guldogan, Esin
    Gabbouj, Moncef
    HUMAN-COMPUTER INTERACTION: DESIGN AND DEVELOPMENT APPROACHES, PT I, 2011, 6761 : 547 - 554
  • [49] Semantic image retrieval for complex queries using a knowledge parser
    Hua Chen
    Antoine Trouve
    Kazuaki J. Murakami
    Akira Fukuda
    Multimedia Tools and Applications, 2018, 77 : 10733 - 10751
  • [50] An indexing and retrieval mechanism for complex similarity queries in image databases
    Cha, GH
    Chung, CW
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 1999, 10 (03) : 268 - 290