The State of the Art for Cross-Modal Retrieval: A Survey

被引:1
|
作者
Zhou, Kun [1 ,2 ]
Hassan, Fadratul Hafinaz [1 ]
Hoon, Gan Keng [1 ]
机构
[1] Univ Sains Malaysia, Sch Comp Sci, Gelugor 11600, Pulau Pinang, Malaysia
[2] Zhejiang Business Technol Inst, Ningbo 315000, Zhejiang, Peoples R China
来源
IEEE ACCESS | 2023年 / 11卷
关键词
Deep learning; cross-modal retrieval; representation learning; multi-modal learning; CANONICAL CORRELATION-ANALYSIS; REPRESENTATION; NETWORKS;
D O I
10.1109/ACCESS.2023.3338548
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal retrieval, which aims to search for semantically relevant data across different modalities, has received increasing attention in recent years. Deep learning, with its ability to extract high-level representations from multimodal data, has become a popular approach for cross-modal retrieval. In this paper, we present a comprehensive survey of deep learning techniques for cross-modal retrieval including 37 papers published in recent years. The review is organized into four main sections, covering traditional subspace learning methods, deep learning, and machine learning-based approaches, techniques based on large multi-modal models, and an analysis of datasets used in the field of cross-modal retrieval. We compare and analyze the performance of different deep learning methods on benchmark datasets, the result shows that although a large number of innovative methods have been proposed, there are still some problems that need to be solved, such as multi-modal feature alignment, multi-modal feature fusion, and subspace learning, as well as specialized datasets.
引用
收藏
页码:138568 / 138589
页数:22
相关论文
共 50 条
  • [1] Survey on Video-Text Cross-Modal Retrieval
    Chen, Lei
    Xi, Yimeng
    Liu, Libo
    [J]. Computer Engineering and Applications, 2024, 60 (04) : 1 - 20
  • [2] Adversarial Cross-Modal Retrieval
    Wang, Bokun
    Yang, Yang
    Xu, Xing
    Hanjalic, Alan
    Shen, Heng Tao
    [J]. PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 154 - 162
  • [3] HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
    Zhang, Chengyuan
    Song, Jiayu
    Zhu, Xiaofeng
    Zhu, Lei
    Zhang, Shichao
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)
  • [4] A semi-supervised cross-modal memory bank for cross-modal retrieval
    Huang, Yingying
    Hu, Bingliang
    Zhang, Yipeng
    Gao, Chi
    Wang, Quan
    [J]. NEUROCOMPUTING, 2024, 579
  • [5] Cross-Modal Center Loss for 3D Cross-Modal Retrieval
    Jing, Longlong
    Vahdani, Elahe
    Tan, Jiaxing
    Tian, Yingli
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3141 - 3150
  • [6] Soft Contrastive Cross-Modal Retrieval
    Song, Jiayu
    Hu, Yuxuan
    Zhu, Lei
    Zhang, Chengyuan
    Zhang, Jian
    Zhang, Shichao
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (05):
  • [7] Probabilistic Embeddings for Cross-Modal Retrieval
    Chun, Sanghyuk
    Oh, Seong Joon
    de Rezende, Rafael Sampaio
    Kalantidis, Yannis
    Larlus, Diane
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8411 - 8420
  • [8] Cross-modal Retrieval with Correspondence Autoencoder
    Feng, Fangxiang
    Wang, Xiaojie
    Li, Ruifan
    [J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 7 - 16
  • [9] Cross-modal retrieval with dual optimization
    Xu, Qingzhen
    Liu, Shuang
    Qiao, Han
    Li, Miao
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (05) : 7141 - 7157
  • [10] Geometric Matching for Cross-Modal Retrieval
    Wang, Zheng
    Gao, Zhenwei
    Yang, Yang
    Wang, Guoqing
    Jiao, Chengbo
    Shen, Heng Tao
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,