Progressive Learning for Image Retrieval with Hybrid-Modality Queries

被引:16
|
作者
Zhao, Yida [1 ]
Song, Yuqing [1 ]
Jin, Qin [1 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Image Retrieval; Progressive Learning; Visual-Linguistic Query Composing; LANGUAGE; VISION;
D O I
10.1145/3477495.3532047
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image retrieval with hybrid-modality queries, also known as composing text and image for image retrieval (CTI-IR), is a retrieval task where the search intention is expressed in a more complex query format, involving both vision and text modalities. For example, a target product image is searched using a reference product image along with text about changing certain attributes of the reference image as the query. It is a more challenging image retrieval task that requires both semantic space learning and cross-modal fusion. Previous approaches that attempt to deal with both aspects achieve unsatisfactory performance. In this paper, we decompose the CTI-IR task into a three-stage learning problem to progressively learn the complex knowledge for image retrieval with hybrid-modality queries. We first leverage the semantic embedding space for open-domain image-text retrieval, and then transfer the learned knowledge to the fashion-domain with fashion-related pre-training tasks. Finally, we enhance the pre-trained model from single-query to hybrid-modality query for the CTI-IR task. Furthermore, as the contribution of individual modality in the hybrid-modality query varies for different retrieval scenarios, we propose a self-supervised adaptive weighting strategy to dynamically determine the importance of image and text in the hybrid-modality query for better retrieval. Extensive experiments show that our proposed model significantly outperforms state-of-the-art methods in the mean of Recall@K by 24.9% and 9.5% on the Fashion-IQ and Shoes benchmark datasets respectively.
引用
收藏
页码:1012 / 1021
页数:10
相关论文
共 50 条
  • [21] Initiating a Hybrid-Modality International Movement Disorders Fellowship: A Collaborative Effort Among Latin American Countries
    Pena, S.
    Salles, P.
    Silva, S.
    Jauregui, R.
    Saffie, P.
    Chana-Cuevas, P.
    MOVEMENT DISORDERS, 2024, 39 : S865 - S865
  • [22] Hybrid-modality high-resolution imaging: for diagnostic biomedical imaging and sensing for disease diagnosis
    Matham, Murukeshan Vadakke
    Hoong-Ta, Lim
    OPTICS IN HEALTH CARE AND BIOMEDICAL OPTICS VI, 2014, 9268
  • [23] Image retrieval with a multi-modality ontology
    Wang, Huan
    Liu, Song
    Chia, Liang-Tien
    MULTIMEDIA SYSTEMS, 2008, 13 (5-6) : 379 - 390
  • [24] Image retrieval with a multi-modality ontology
    Huan Wang
    Song Liu
    Liang-Tien Chia
    Multimedia Systems, 2008, 13 : 379 - 390
  • [25] Image retrieval++ - web image retrieval with an enhanced multi-modality ontology
    Wang, Huan
    Chia, Liang-Tien
    Liu, Song
    MULTIMEDIA TOOLS AND APPLICATIONS, 2008, 39 (02) : 189 - 215
  • [26] Hybrid Fusion with Intra- and Cross-Modality Attention for Image-Recipe Retrieval
    Li, Jiao
    Xu, Xing
    Yu, Wei
    Shen, Fumin
    Cao, Zuo
    Zuo, Kai
    Shen, Heng Tao
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 244 - 254
  • [27] Progressive Generative Hashing for Image Retrieval
    Ma, Yuqing
    He, Yue
    Ding, Fan
    Hu, Sheng
    Li, Jun
    Liu, Xianglong
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 871 - 877
  • [28] Deep Progressive Hashing for Image Retrieval
    Bai, Jiale
    Ni, Bingbing
    Wang, Minsi
    Li, Zefan
    Cheng, Shuo
    Yang, Xiaokang
    Hu, Chuanping
    Gao, Wen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (12) : 3178 - 3193
  • [29] Deep Progressive Hashing for Image Retrieval
    Bai, Jiale
    Ni, Bingbing
    Wang, Minsi
    Shen, Yang
    Lai, Hanjiang
    Zhang, Chongyang
    Mei, Lin
    Hu, Chuanping
    Yao, Chen
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 208 - 216
  • [30] Analyzing Users' Retrieval Behaviours and Image Queries of a Photojournalism Image Database
    Chen, Hsin-liang
    Kochtanek, Thomas
    Burns, Christopher Sean
    Shaw, Rick
    CANADIAN JOURNAL OF INFORMATION AND LIBRARY SCIENCE-REVUE CANADIENNE DES SCIENCES DE L INFORMATION ET DE BIBLIOTHECONOMIE, 2010, 34 (03): : 249 - 270