Analysis of multimodal data fusion from an information theory perspective

被引:10
|
作者
Dai, Yinglong [1 ,2 ]
Yan, Zheng [3 ]
Cheng, Jiangchang [2 ]
Duan, Xiaojun [1 ]
Wang, Guojun [4 ]
机构
[1] Natl Univ Def Technol, Coll Sci, Changsha 410073, Hunan, Peoples R China
[2] Hunan Normal Univ, Coll Informat Sci & Engn, Hunan Prov Key Lab Intelligent Comp & Language Inf, Changsha 410081, Hunan, Peoples R China
[3] Xidian Univ, Sch Cyber Engn, State Key Lab Integrated Serv Networks, Xian 710071, Shanxi, Peoples R China
[4] Guangzhou Univ, Sch Comp Sci, Guangzhou 510006, Guangdong, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Multimodal data fusion; Information entropy; Model uncertainty; Deep learning; Information loss measurement; DEEP; CLASSIFICATION; ARCHITECTURES; INTEGRATION;
D O I
10.1016/j.ins.2022.12.014
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Inspired by the McGurk effect, studies on multimodal data fusion start with audio-visual speech recognition tasks. Multimodal data fusion research was not popular for a period of time because the capacity of traditional machine learning is limited. Recently, advances in deep learning techniques have provided new opportunities for multimodal data fusion. Powerful deep learning models have the capacity to process high-dimensional and com-plex multimodal data, and multimodal deep learning has the potential to process multi -modal data at the human level. However, there is still a lack of theoretical analytical methods relating data information with model performance. In this work, we propose basic concepts and principles to gain insight into the process of multimodal data fusion from an information theory perspective. We analyze different multimodal data fusion cases, such as redundant, noisy, consistent, and contradictory data fusion. We define the model accuracy upper bound for multimodal tasks and prove that a multimodal model with an extra modal channel can perform better in theory when extra modal data provide more effective infor-mation for prediction. We explicitly inspect the latent representation space and analyze the information loss of the representation space transformation in deep learning for the first time. From a naive example to a multimodal deep learning example, we demonstrate the theoretical analysis method for evaluating a multimodal data fusion model, and the experimental results validate the definitions and principles.(c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:164 / 183
页数:20
相关论文
共 50 条
  • [1] Information Visualization from the Perspective of Big Data Analysis and Fusion
    Lin, Xiang
    [J]. SCIENTIFIC PROGRAMMING, 2021, 2021
  • [2] Multimodal Data Fusion Based on Mutual Information
    Bramon, Roger
    Boada, Imma
    Bardera, Anton
    Rodriguez, Joaquim
    Feixas, Miquel
    Puig, Josep
    Sbert, Mateu
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2012, 18 (09) : 1574 - 1587
  • [3] Data fusion for driver drowsiness recognition: A multimodal perspective
    Priyanka, S.
    Shanthi, S.
    Kumar, A. Saran
    Praveen, V.
    [J]. EGYPTIAN INFORMATICS JOURNAL, 2024, 27
  • [4] A Simple Analysis of Multimodal Data Fusion
    Cheng, Jiangchang
    Dai, Yinglong
    Yuan, Yao
    Zhu, Hongli
    [J]. 2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 1472 - 1475
  • [5] Multimodal Information Fusion for Semantic Video Analysis
    Gulen, Elvan
    Yilmaz, Turgay
    Yazici, Adnan
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2012, 3 (04): : 52 - 74
  • [6] Multimodal biometric fusion using data quality information
    Wang, YC
    Casasent, D
    [J]. OPTICAL PATTERN RECOGNITION XVI, 2005, 5816 : 329 - 338
  • [7] A survey of multimodal information fusion for smart healthcare: Mapping the journey from data to wisdom
    Shaik, Thanveer
    Tao, Xiaohui
    Li, Lin
    Xie, Haoran
    Velasquez, Juan D.
    [J]. INFORMATION FUSION, 2024, 102
  • [8] Robust Data Fusion of Multimodal Sensory Information for Mobile Robots
    Kubelka, Vladimir
    Oswald, Lorenz
    Pomerleau, Francois
    Colas, Francis
    Svoboda, Tomas
    Reinstein, Michal
    [J]. JOURNAL OF FIELD ROBOTICS, 2015, 32 (04) : 447 - 473
  • [9] Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis
    Han, Wei
    Chen, Hui
    Poria, Soujanya
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9180 - 9192
  • [10] Guest Editorial: Information Fusion for Medical Data: Early, Late, and Deep Fusion Methods for Multimodal Data
    Domingues, Ines
    Mueller, Henning
    Ortiz, Andres
    Dasarathy, Belur V.
    Abreu, Pedro H.
    Calhoun, Vince D.
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2020, 24 (01) : 14 - 16