Scalable Deep Multimodal Learning for Cross-Modal Retrieval

被引:77
|
作者
Hu, Peng [1 ]
Zhen, Liangli [2 ]
Peng, Dezhong [1 ,3 ]
Liu, Pei [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Machine Intelligence Lab, Chengdu 610065, Sichuan, Peoples R China
[2] ASTAR, Inst High Performance Comp, Singapore 138632, Singapore
[3] Chengdu Sobey Digital Technol Co Ltd, Chengdu 610041, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; multimodal learning; representation learning; FUSION;
D O I
10.1145/3331184.3331213
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal retrieval takes one type of data as the query to retrieve relevant data of another type. Most of existing cross-modal retrieval approaches were proposed to learn a common subspace in a joint manner, where the data from all modalities have to be involved during the whole training process. For these approaches, the optimal parameters of different modality-specific transformations are dependent on each other and the whole model has to be retrained when handling samples from new modalities. In this paper, we present a novel cross-modal retrieval method, called Scalable Deep Multimodal Learning (SDML). It proposes to predefine a common subspace, in which the between-class variation is maximized while the within-class variation is minimized. Then, it trainsm modality-specific networks form modalities (one network for each modality) to transform the multimodal data into the predefined common subspace to achieve multimodal learning. Unlike many of the existing methods, our method can train different modality-specific networks independently and thus be scalable to the number of modalities. To the best of our knowledge, the proposed SDML could be one of the first works to independently project data of an unfixed number of modalities into a predefined common subspace. Comprehensive experimental results on four widely-used benchmark datasets demonstrate that the proposed method is effective and efficient in multimodal learning and outperforms the state-of-the-art methods in cross-modal retrieval.
引用
收藏
页码:635 / 644
页数:10
相关论文
共 50 条
  • [41] Deep Learning and Shared Representation Space Learning Based Cross-Modal Multimedia Retrieval
    Zou, Hui
    Du, Ji-Xiang
    Zhai, Chuan-Min
    Wang, Jing
    [J]. INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2016, PT II, 2016, 9772 : 322 - 331
  • [42] Discrete semantic embedding hashing for scalable cross-modal retrieval
    Liu, Junjie
    Fei, Lunke
    Jia, Wei
    Zhao, Shuping
    Wen, Jie
    Teng, Shaohua
    Zhang, Wei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 1461 - 1467
  • [43] Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval
    Shukor, Mustafa
    Couairon, Guillaume
    Grechka, Asya
    Cord, Matthieu
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4566 - 4577
  • [44] A Framework for Enabling Unpaired Multi-Modal Learning for Deep Cross-Modal Hashing Retrieval
    Williams-Lekuona, Mikel
    Cosma, Georgina
    Phillips, Iain
    [J]. JOURNAL OF IMAGING, 2022, 8 (12)
  • [45] Multimodal Multiclass Boosting and its Application to Cross-modal Retrieval
    Wang, Shixun
    Dou, Zhi
    Chen, Deng
    Yu, Hairong
    Li, Yuan
    Pan, Peng
    [J]. NEUROCOMPUTING, 2019, 357 : 11 - 23
  • [46] Multimodal Encoders for Food-Oriented Cross-Modal Retrieval
    Chen, Ying
    Zhou, Dong
    Li, Lin
    Han, Jun-mei
    [J]. WEB AND BIG DATA, APWEB-WAIM 2021, PT II, 2021, 12859 : 253 - 266
  • [47] Cross-Modal Event Retrieval: A Dataset and a Baseline Using Deep Semantic Learning
    Situ, Runwei
    Yang, Zhenguo
    Lv, Jianming
    Li, Qing
    Liu, Wenyin
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2018, PT II, 2018, 11165 : 147 - 157
  • [48] Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation
    Guo, Weikuo
    Huang, Huaibo
    Kong, Xiangwei
    He, Ran
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1712 - 1720
  • [49] Deep Adversarial Learning Triplet Similarity Preserving Cross-Modal Retrieval Algorithm
    Li, Guokun
    Wang, Zhen
    Xu, Shibo
    Feng, Chuang
    Yang, Xiaohan
    Wu, Nannan
    Sun, Fuzhen
    [J]. MATHEMATICS, 2022, 10 (15)
  • [50] Hybrid DAER Based Cross-Modal Retrieval Exploiting Deep Representation Learning
    Huang, Zhao
    Hu, Haowu
    Su, Miao
    [J]. ENTROPY, 2023, 25 (08)