Progressive Learning for Image Retrieval with Hybrid-Modality Queries

被引:16
|
作者
Zhao, Yida [1 ]
Song, Yuqing [1 ]
Jin, Qin [1 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Image Retrieval; Progressive Learning; Visual-Linguistic Query Composing; LANGUAGE; VISION;
D O I
10.1145/3477495.3532047
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image retrieval with hybrid-modality queries, also known as composing text and image for image retrieval (CTI-IR), is a retrieval task where the search intention is expressed in a more complex query format, involving both vision and text modalities. For example, a target product image is searched using a reference product image along with text about changing certain attributes of the reference image as the query. It is a more challenging image retrieval task that requires both semantic space learning and cross-modal fusion. Previous approaches that attempt to deal with both aspects achieve unsatisfactory performance. In this paper, we decompose the CTI-IR task into a three-stage learning problem to progressively learn the complex knowledge for image retrieval with hybrid-modality queries. We first leverage the semantic embedding space for open-domain image-text retrieval, and then transfer the learned knowledge to the fashion-domain with fashion-related pre-training tasks. Finally, we enhance the pre-trained model from single-query to hybrid-modality query for the CTI-IR task. Furthermore, as the contribution of individual modality in the hybrid-modality query varies for different retrieval scenarios, we propose a self-supervised adaptive weighting strategy to dynamically determine the importance of image and text in the hybrid-modality query for better retrieval. Extensive experiments show that our proposed model significantly outperforms state-of-the-art methods in the mean of Recall@K by 24.9% and 9.5% on the Fashion-IQ and Shoes benchmark datasets respectively.
引用
收藏
页码:1012 / 1021
页数:10
相关论文
共 50 条
  • [31] Web Image Retrieval for Abstract Queries Using Text and Image Information
    Shimada, Kazutaka
    Ishikawa, Suguru
    Endo, Tsutomu
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2009, 5839 : 300 - 309
  • [32] Texture similarity queries and relevance feedback for image retrieval
    Patrice, B
    Konik, H
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS: APPLICATIONS, ROBOTICS SYSTEMS AND ARCHITECTURES, 2000, : 55 - 58
  • [33] Improving Image Retrieval Effectiveness via Multiple Queries
    Xiangyu Jin
    James C. French
    Multimedia Tools and Applications, 2005, 26 : 221 - 245
  • [34] Improving image retrieval effectiveness via multiple queries
    Jin, XY
    French, JC
    MULTIMEDIA TOOLS AND APPLICATIONS, 2005, 26 (02) : 221 - 245
  • [35] A retrieval mechanism for complex similarity queries in image databases
    Han, S.
    Chen, C.
    Lu, Z.
    Huazhong Ligong Daxue Xuebao/Journal Huazhong (Central China) University of Science and Technology, 2001, 29 (03): : 36 - 38
  • [36] Image Retrieval for Complex Queries Using Knowledge Embedding
    Chaudhary, Chandramani
    Goyal, Poonam
    Goyal, Navneet
    Chen, Yi-Ping Phoebe
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (01)
  • [37] Learning Image Information for eCommerce Queries
    Porwal, Utkarsh
    ADCS 2019: PROCEEDINGS OF THE 24TH AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM, 2019,
  • [38] Learning to disentangle and fuse for fine-grained multi-modality ship image retrieval
    Xiong, Wei
    Xiong, Zhenyu
    Xu, Pingliang
    Cui, Yaqi
    Li, Haoran
    Huang, Linzhou
    Yang, Ruining
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [39] Structured hashing with deep learning for modality, organ, and disease content sensitive medical image retrieval
    Manna, Asim
    Dewan, Dipayan
    Sheet, Debdoot
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [40] Text-Based Image Retrieval using Progressive Multi-Instance Learning
    Li, Wen
    Duan, Lixin
    Xu, Dong
    Tsang, Ivor Wai-Hung
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 2049 - 2055