Interpretable multimodal emotion recognition using hybrid fusion of speech and image data

被引：3

作者：

Kumar, Puneet ^{[1
]}

Malik, Sarthak ^{[2
]}

Raman, Balasubramanian ^{[1
]}

机构：

[1] Indian Inst Technol Roorkee, Dept Comp Sci & Engn, Roorkee, India

[2] Indian Inst Technol Roorkee, Dept Elect Engn, Roorkee, India

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2024年 / 83卷 / 10期

关键词：

Affective computing; Interpretable AI; Multimodal analysis; Information fusion; Speech and image processing;

D O I：

10.1007/s11042-023-16443-1

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper proposes a multimodal emotion recognition system based on hybrid fusion that classifies the emotions depicted by speech utterances and corresponding images into discrete classes. A new interpretability technique has been developed to identify the important speech and image features leading to the prediction of particular emotion classes. The proposed system's architecture has been determined through intensive ablation studies. It fuses the speech & image features and then combines speech, image, and intermediate fusion outputs. The proposed interpretability technique incorporates the divide and conquer approach to compute shapely values denoting each speech and image feature's importance. We have also constructed a large-scale dataset, IIT-R SIER dataset, consisting of speech utterances, corresponding images, and class labels, i.e., 'anger,' 'happy,' 'hate,' and 'sad.' The proposed system has achieved 83.29% accuracy for emotion recognition. The enhanced performance of the proposed system advocates the importance of utilizing complementary information from multiple modalities for emotion recognition.

引用

页码：28373 / 28394

页数：22

共 50 条

[1] Interpretable multimodal emotion recognition using hybrid fusion of speech and image data
Puneet Kumar
Sarthak Malik
Balasubramanian Raman
[J]. Multimedia Tools and Applications, 2024, 83 : 28373 - 28394
[2] A Hybrid Latent Space Data Fusion Method for Multimodal Emotion Recognition
Nemati, Shahla
Rohani, Reza
Basiri, Mohammad Ehsan
Abdar, Moloud
Yen, Neil Y.
Makarenkov, Vladimir
[J]. IEEE ACCESS, 2019, 7 : 172948 - 172964
[3] Multimodal transformer augmented fusion for speech emotion recognition
Wang, Yuanyuan
Gu, Yu
Yin, Yifei
Han, Yingping
Zhang, He
Wang, Shuang
Li, Chenyu
Quan, Dou
[J]. FRONTIERS IN NEUROROBOTICS, 2023, 17
[4] Multimodal emotion recognition for the fusion of speech and EEG signals
Ma, Jianghe
Sun, Ying
Zhang, Xueying
[J]. Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (01): : 143 - 150
[5] Speech emotion recognition using multimodal feature fusion with machine learning approach
Sandeep Kumar Panda
Ajay Kumar Jena
Mohit Ranjan Panda
Susmita Panda
[J]. Multimedia Tools and Applications, 2023, 82 : 42763 - 42781
[6] Speech emotion recognition using multimodal feature fusion with machine learning approach
Panda, Sandeep Kumar
Jena, Ajay Kumar
Panda, Mohit Ranjan
Panda, Susmita
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (27) : 42763 - 42781
[7] HYBRID FUSION BASED APPROACH FOR MULTIMODAL EMOTION RECOGNITION WITH INSUFFICIENT LABELED DATA
Kumar, Puneet
Khokher, Vedanti
Gupta, Yukti
Raman, Balasubramanian
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 314 - 318
[8] Speech Emotion Recognition among Elderly Individuals using Multimodal Fusion and Transfer Learning
Boateng, George
Kowatsch, Tobias
[J]. COMPANION PUBLICATON OF THE 2020 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION (ICMI '20 COMPANION), 2020, : 12 - 16
[9] MSER: Multimodal speech emotion recognition using cross-attention with deep fusion
Khan, Mustaqeem
Gueaieb, Wail
El Saddik, Abdulmotaleb
Kwon, Soonil
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245
[10] Emotion Recognition Based on Feedback Weighted Fusion of Multimodal Emotion Data
Wei, Wei
Jia, Qingxuan
Feng, Yongli
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (IEEE ROBIO 2017), 2017, : 1682 - 1687

← 1 2 3 4 5 →