The State of the Art for Cross-Modal Retrieval: A Survey

被引:1
|
作者
Zhou, Kun [1 ,2 ]
Hassan, Fadratul Hafinaz [1 ]
Hoon, Gan Keng [1 ]
机构
[1] Univ Sains Malaysia, Sch Comp Sci, Gelugor 11600, Pulau Pinang, Malaysia
[2] Zhejiang Business Technol Inst, Ningbo 315000, Zhejiang, Peoples R China
来源
IEEE ACCESS | 2023年 / 11卷
关键词
Deep learning; cross-modal retrieval; representation learning; multi-modal learning; CANONICAL CORRELATION-ANALYSIS; REPRESENTATION; NETWORKS;
D O I
10.1109/ACCESS.2023.3338548
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal retrieval, which aims to search for semantically relevant data across different modalities, has received increasing attention in recent years. Deep learning, with its ability to extract high-level representations from multimodal data, has become a popular approach for cross-modal retrieval. In this paper, we present a comprehensive survey of deep learning techniques for cross-modal retrieval including 37 papers published in recent years. The review is organized into four main sections, covering traditional subspace learning methods, deep learning, and machine learning-based approaches, techniques based on large multi-modal models, and an analysis of datasets used in the field of cross-modal retrieval. We compare and analyze the performance of different deep learning methods on benchmark datasets, the result shows that although a large number of innovative methods have been proposed, there are still some problems that need to be solved, such as multi-modal feature alignment, multi-modal feature fusion, and subspace learning, as well as specialized datasets.
引用
收藏
页码:138568 / 138589
页数:22
相关论文
共 50 条
  • [41] Deep Memory Network for Cross-Modal Retrieval
    Song, Ge
    Wang, Dong
    Tan, Xiaoyang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (05) : 1261 - 1275
  • [42] Multi-modal semantic autoencoder for cross-modal retrieval
    Wu, Yiling
    Wang, Shuhui
    Huang, Qingming
    [J]. NEUROCOMPUTING, 2019, 331 : 165 - 175
  • [43] Cross-Modal Retrieval With Partially Mismatched Pairs
    Hu, Peng
    Huang, Zhenyu
    Peng, Dezhong
    Wang, Xu
    Peng, Xi
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (08) : 9595 - 9610
  • [44] Robust Cross-Modal Retrieval by Adversarial Training
    Zhang, Tao
    Sun, Shiliang
    Zhao, Jing
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [45] Augmented Adversarial Training for Cross-Modal Retrieval
    Wu, Yiling
    Wang, Shuhui
    Song, Guoli
    Huang, Qingming
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 559 - 571
  • [46] Cross-modal retrieval of scripted speech audio
    Owen, CB
    Makedon, F
    [J]. MULTIMEDIA COMPUTING AND NETWORKING 1998, 1997, 3310 : 226 - 235
  • [47] GrowBit: Incremental Hashing for Cross-Modal Retrieval
    Mandal, Devraj
    Annadani, Yashas
    Biswas, Soma
    [J]. COMPUTER VISION - ACCV 2018, PT IV, 2019, 11364 : 305 - 321
  • [48] Region-based Cross-modal Retrieval
    Hou, Danyang
    Pang, Liang
    Lan, Yanyan
    Shen, Huawei
    Cheng, Xueqi
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [49] Audiovisual cross-modal material surface retrieval
    Liu, Zhuokun
    Liu, Huaping
    Huang, Wenmei
    Wang, Bowen
    Sun, Fuchun
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (18): : 14301 - 14309
  • [50] Kernelized Cross-Modal Hashing for Multimedia Retrieval
    Tan, Shoubiao
    Hu, Lingyu
    Wang-Xu, Anqi
    Tang, Jun
    Jia, Zhaohong
    [J]. PROCEEDINGS OF THE 2016 12TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2016, : 1224 - 1228