Scalable Deep Multimodal Learning for Cross-Modal Retrieval

被引:77
|
作者
Hu, Peng [1 ]
Zhen, Liangli [2 ]
Peng, Dezhong [1 ,3 ]
Liu, Pei [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Machine Intelligence Lab, Chengdu 610065, Sichuan, Peoples R China
[2] ASTAR, Inst High Performance Comp, Singapore 138632, Singapore
[3] Chengdu Sobey Digital Technol Co Ltd, Chengdu 610041, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; multimodal learning; representation learning; FUSION;
D O I
10.1145/3331184.3331213
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal retrieval takes one type of data as the query to retrieve relevant data of another type. Most of existing cross-modal retrieval approaches were proposed to learn a common subspace in a joint manner, where the data from all modalities have to be involved during the whole training process. For these approaches, the optimal parameters of different modality-specific transformations are dependent on each other and the whole model has to be retrained when handling samples from new modalities. In this paper, we present a novel cross-modal retrieval method, called Scalable Deep Multimodal Learning (SDML). It proposes to predefine a common subspace, in which the between-class variation is maximized while the within-class variation is minimized. Then, it trainsm modality-specific networks form modalities (one network for each modality) to transform the multimodal data into the predefined common subspace to achieve multimodal learning. Unlike many of the existing methods, our method can train different modality-specific networks independently and thus be scalable to the number of modalities. To the best of our knowledge, the proposed SDML could be one of the first works to independently project data of an unfixed number of modalities into a predefined common subspace. Comprehensive experimental results on four widely-used benchmark datasets demonstrate that the proposed method is effective and efficient in multimodal learning and outperforms the state-of-the-art methods in cross-modal retrieval.
引用
收藏
页码:635 / 644
页数:10
相关论文
共 50 条
  • [21] Scalable Discrete and Asymmetric Unequal Length Hashing Learning for Cross-Modal Retrieval
    Teng, Shaohua
    Li, Jiangbo
    Teng, Luyao
    Fei, Lunke
    Wu, Naiqi
    Zhang, Wei
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7917 - 7932
  • [22] Cross-lingual Cross-modal Pretraining for Multimodal Retrieval
    Fei, Hongliang
    Yu, Tan
    Li, Ping
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3644 - 3650
  • [23] A Scalable Architecture for Cross-Modal Semantic Annotation and Retrieval
    Moeller, Manuel
    Sintek, Michael
    [J]. KI 2008: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2008, 5243 : 391 - 392
  • [24] A novel cross-modal hashing algorithm based on multimodal deep learning
    Qu, Wen
    Wang, Daling
    Feng, Shi
    Zhang, Yifei
    Yu, Ge
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2017, 60 (09)
  • [25] A novel cross-modal hashing algorithm based on multimodal deep learning
    Wen QU
    Daling WANG
    Shi FENG
    Yifei ZHANG
    Ge YU
    [J]. Science China(Information Sciences), 2017, 60 (09) : 50 - 63
  • [26] Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis
    Hu, Xuming
    Guo, Zhijiang
    Teng, Zhiyang
    King, Irwin
    Yu, Philip S.
    [J]. 61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 303 - 311
  • [27] Two-stage deep learning for supervised cross-modal retrieval
    Jie Shao
    Zhicheng Zhao
    Fei Su
    [J]. Multimedia Tools and Applications, 2019, 78 : 16615 - 16631
  • [28] Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval
    Yu, Yi
    Tang, Suhua
    Raposo, Francisco
    Chen, Lei
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (01)
  • [29] Deep-Learning-based Cross-Modal Luxury Microblogs Retrieval
    Menghao, Ma
    Liu, Wuying
    Feng, Wenhe
    [J]. 2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 90 - 94
  • [30] Two-stage deep learning for supervised cross-modal retrieval
    Shao, Jie
    Zhao, Zhicheng
    Su, Fei
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (12) : 16615 - 16631