Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry, and Fusion

被引:133
|
作者
Wang, Yang [1 ,2 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China
[2] Hefei Univ Technol, Intelligent Interconnected Syst Lab Anhui Prov, Hefei, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-modal data; deep neural networks; MULTIVIEW; REPRESENTATIONS; RECOGNITION; NETWORK;
D O I
10.1145/3408317
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the development of web technology, multi-modal or multi-view data has surged as a major stream for big data, where each modal/view encodes individual property of data objects. Often, different modalities are complementary to each other. This fact motivated a lot of research attention on fusing the multi-modal feature spaces to comprehensively characterize the data objects. Most of the existing state-of-the-arts focused on how to fuse the energy or information from multi-modal spaces to deliver a superior performance over their counterparts with single modal. Recently, deep neural networks have been exhibited as a powerful architecture to well capture the nonlinear distribution of high-dimensional multimedia data, so naturally does for multi-modal data. Substantial empirical studies are carried out to demonstrate its advantages that are benefited from deep multi-modal methods, which can essentially deepen the fusion from multi-modal deep feature spaces. In this article, we provide a substantial overview of the existing state-of-the-arts in the field of multi-modal data analytics from shallow to deep spaces. Throughout this survey, we further indicate that the critical components for this field go to collaboration, adversarial competition, and fusion over multi-modal spaces. Finally, we share our viewpoints regarding some future directions in this field.
引用
收藏
页数:25
相关论文
共 50 条
  • [21] FUSION METHODS FOR MULTI-MODAL INDEXING OF WEB DATA
    Niaz, Usman
    Merialdo, Bernard
    2013 14TH INTERNATIONAL WORKSHOP ON IMAGE ANALYSIS FOR MULTIMEDIA INTERACTIVE SERVICES (WIAMIS), 2013,
  • [22] Multi-Modal Fusion Technology Based on Vehicle Information: A Survey
    Zhang, Xinyu
    Gong, Yan
    Lu, Jianli
    Wu, Jiayi
    Li, Zhiwei
    Jin, Dafeng
    Li, Jun
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (06): : 3605 - 3619
  • [23] Analysis of Deep Fusion Strategies for Multi-modal Gesture Recognition
    Roitberg, Alina
    Pollert, Tim
    Haurilet, Monica
    Martin, Manuel
    Stiefelhagen, Rainer
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 198 - 206
  • [24] Dynamic Deep Multi-modal Fusion for Image Privacy Prediction
    Tonge, Ashwini
    Caragea, Cornelia
    WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 1829 - 1840
  • [25] Deep Gated Multi-modal Fusion for Image Privacy Prediction
    Zhao, Chenye
    Caragea, Cornelia
    ACM TRANSACTIONS ON THE WEB, 2023, 17 (04)
  • [26] Guided Image Deblurring by Deep Multi-Modal Image Fusion
    Liu, Yuqi
    Sheng, Zehua
    Shen, Hui-Liang
    IEEE ACCESS, 2022, 10 : 130708 - 130718
  • [27] Special issue on multi-modal information learning and analytics on big data
    Xiaomeng Ma
    Yan Sun
    Neural Computing and Applications, 2022, 34 : 3299 - 3300
  • [28] Multi-modal travel in India: A Big Data Approach for Policy Analytics
    Sankaranarayanan, Hari Bhaskar
    Thind, Ravish Singh
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE AND ENGINEERING (CONFLUENCE 2017), 2017, : 243 - 248
  • [29] Special issue on multi-modal information learning and analytics on big data
    Ma, Xiaomeng
    Sun, Yan
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (05): : 3299 - 3300
  • [30] Multi-modal data fusion of Voice and EMG data for Robotic Control
    Mohd, Tauheed Khan
    Carvalho, Jackson
    Javaid, Ahmad Y.
    2017 IEEE 8TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS AND MOBILE COMMUNICATION CONFERENCE (UEMCON), 2017, : 329 - 333