Scalable Deep Multimodal Learning for Cross-Modal Retrieval

被引:77
|
作者
Hu, Peng [1 ]
Zhen, Liangli [2 ]
Peng, Dezhong [1 ,3 ]
Liu, Pei [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Machine Intelligence Lab, Chengdu 610065, Sichuan, Peoples R China
[2] ASTAR, Inst High Performance Comp, Singapore 138632, Singapore
[3] Chengdu Sobey Digital Technol Co Ltd, Chengdu 610041, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; multimodal learning; representation learning; FUSION;
D O I
10.1145/3331184.3331213
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal retrieval takes one type of data as the query to retrieve relevant data of another type. Most of existing cross-modal retrieval approaches were proposed to learn a common subspace in a joint manner, where the data from all modalities have to be involved during the whole training process. For these approaches, the optimal parameters of different modality-specific transformations are dependent on each other and the whole model has to be retrained when handling samples from new modalities. In this paper, we present a novel cross-modal retrieval method, called Scalable Deep Multimodal Learning (SDML). It proposes to predefine a common subspace, in which the between-class variation is maximized while the within-class variation is minimized. Then, it trainsm modality-specific networks form modalities (one network for each modality) to transform the multimodal data into the predefined common subspace to achieve multimodal learning. Unlike many of the existing methods, our method can train different modality-specific networks independently and thus be scalable to the number of modalities. To the best of our knowledge, the proposed SDML could be one of the first works to independently project data of an unfixed number of modalities into a predefined common subspace. Comprehensive experimental results on four widely-used benchmark datasets demonstrate that the proposed method is effective and efficient in multimodal learning and outperforms the state-of-the-art methods in cross-modal retrieval.
引用
收藏
页码:635 / 644
页数:10
相关论文
共 50 条
  • [1] Deep Multimodal Transfer Learning for Cross-Modal Retrieval
    Zhen, Liangli
    Hu, Peng
    Peng, Xi
    Goh, Rick Siow Mong
    Zhou, Joey Tianyi
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (02) : 798 - 810
  • [2] Cross-Modal Retrieval using Random Multimodal Deep Learning
    Somasekar, Hemanth
    Naveen, Kavya
    [J]. JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2019, 14 (02): : 185 - 200
  • [3] Multimodal Graph Learning for Cross-Modal Retrieval
    Xie, Jingyou
    Zhao, Zishuo
    Lin, Zhenzhou
    Shen, Ying
    [J]. PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 145 - 153
  • [4] Deep multimodal learning for cross-modal retrieval: One model for all tasks
    Beltran, L. Viviana Beltran
    Caicedo, Juan C.
    Journet, Nicholas
    Coustaty, Mickael
    Lecellier, Francois
    Doucet, Antoine
    [J]. PATTERN RECOGNITION LETTERS, 2021, 146 : 38 - 45
  • [5] Cross-Modal Retrieval Using Deep Learning
    Malik, Shaily
    Bhardwaj, Nikhil
    Bhardwaj, Rahul
    Kumar, Saurabh
    [J]. PROCEEDINGS OF THIRD DOCTORAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE, DOSCI 2022, 2023, 479 : 725 - 734
  • [6] Deep supervised multimodal semantic autoencoder for cross-modal retrieval
    Tian, Yu
    Yang, Wenjing
    Liu, Qingsong
    Yang, Qiong
    [J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2020, 31 (4-5)
  • [7] Variational Deep Representation Learning for Cross-Modal Retrieval
    Yang, Chen
    Deng, Zongyong
    Li, Tianyu
    Liu, Hao
    Liu, Libo
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2021, PT II, 2021, 13020 : 498 - 510
  • [8] Deep adversarial metric learning for cross-modal retrieval
    Xu, Xing
    He, Li
    Lu, Huimin
    Gao, Lianli
    Ji, Yanli
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (02): : 657 - 672
  • [9] Deep Hashing Similarity Learning for Cross-Modal Retrieval
    Ma, Ying
    Wang, Meng
    Lu, Guangyun
    Sun, Yajun
    [J]. IEEE ACCESS, 2024, 12 : 8609 - 8618
  • [10] Deep adversarial metric learning for cross-modal retrieval
    Xing Xu
    Li He
    Huimin Lu
    Lianli Gao
    Yanli Ji
    [J]. World Wide Web, 2019, 22 : 657 - 672