Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry, and Fusion

被引:133
|
作者
Wang, Yang [1 ,2 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China
[2] Hefei Univ Technol, Intelligent Interconnected Syst Lab Anhui Prov, Hefei, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-modal data; deep neural networks; MULTIVIEW; REPRESENTATIONS; RECOGNITION; NETWORK;
D O I
10.1145/3408317
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the development of web technology, multi-modal or multi-view data has surged as a major stream for big data, where each modal/view encodes individual property of data objects. Often, different modalities are complementary to each other. This fact motivated a lot of research attention on fusing the multi-modal feature spaces to comprehensively characterize the data objects. Most of the existing state-of-the-arts focused on how to fuse the energy or information from multi-modal spaces to deliver a superior performance over their counterparts with single modal. Recently, deep neural networks have been exhibited as a powerful architecture to well capture the nonlinear distribution of high-dimensional multimedia data, so naturally does for multi-modal data. Substantial empirical studies are carried out to demonstrate its advantages that are benefited from deep multi-modal methods, which can essentially deepen the fusion from multi-modal deep feature spaces. In this article, we provide a substantial overview of the existing state-of-the-arts in the field of multi-modal data analytics from shallow to deep spaces. Throughout this survey, we further indicate that the critical components for this field go to collaboration, adversarial competition, and fusion over multi-modal spaces. Finally, we share our viewpoints regarding some future directions in this field.
引用
收藏
页数:25
相关论文
共 50 条
  • [41] Fusion of infrared and range data: Multi-modal face images
    Chen, X
    Flynn, PJ
    Bowyer, KW
    ADVANCES IN BIOMETRICS, PROCEEDINGS, 2006, 3832 : 55 - 63
  • [42] A comparative review on multi-modal sensors fusion based on deep learning
    Tang, Qin
    Liang, Jing
    Zhu, Fangqi
    SIGNAL PROCESSING, 2023, 213
  • [43] Deep Convolutional Neural Network for Multi-Modal Image Restoration and Fusion
    Deng, Xin
    Dragotti, Pier Luigi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (10) : 3333 - 3348
  • [44] Deep Fusion for Multi-Modal 6D Pose Estimation
    Lin, Shifeng
    Wang, Zunran
    Zhang, Shenghao
    Ling, Yonggen
    Yang, Chenguang
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, 21 (04) : 6540 - 6549
  • [45] Multi-modal deep fusion based fake news detection method
    Jing Q.
    Fan X.
    Wang B.
    Bi J.
    Tan H.
    High Technology Letters, 2022, 32 (04) : 392 - 403
  • [46] Exploring Fusion Strategies in Deep Learning Models for Multi-Modal Classification
    Zhang, Duoyi
    Nayak, Richi
    Bashar, Md Abul
    DATA MINING, AUSDM 2021, 2021, 1504 : 102 - 117
  • [47] Deep unsupervised multi-modal fusion network for detecting driver distraction
    Zhang, Yuxin
    Chen, Yiqiang
    Gao, Chenlong
    NEUROCOMPUTING, 2021, 421 : 26 - 38
  • [48] Deep unsupervised multi-modal fusion network for detecting driver distraction
    Zhang Y.
    Chen Y.
    Gao C.
    Neurocomputing, 2021, 421 : 26 - 38
  • [49] Deep fusion of multi-modal features for brain tumor image segmentation
    Zhang, Guying
    Zhou, Jia
    He, Guanghua
    Zhu, Hancan
    HELIYON, 2023, 9 (08)
  • [50] Multi-Modal Object Tracking and Image Fusion With Unsupervised Deep Learning
    LaHaye, Nicholas
    Ott, Jordan
    Garay, Michael J.
    El-Askary, Hesham Mohamed
    Linstead, Erik
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2019, 12 (08) : 3056 - 3066