Interpretable multimodal emotion recognition using hybrid fusion of speech and image data

被引:3
|
作者
Kumar, Puneet [1 ]
Malik, Sarthak [2 ]
Raman, Balasubramanian [1 ]
机构
[1] Indian Inst Technol Roorkee, Dept Comp Sci & Engn, Roorkee, India
[2] Indian Inst Technol Roorkee, Dept Elect Engn, Roorkee, India
关键词
Affective computing; Interpretable AI; Multimodal analysis; Information fusion; Speech and image processing;
D O I
10.1007/s11042-023-16443-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a multimodal emotion recognition system based on hybrid fusion that classifies the emotions depicted by speech utterances and corresponding images into discrete classes. A new interpretability technique has been developed to identify the important speech and image features leading to the prediction of particular emotion classes. The proposed system's architecture has been determined through intensive ablation studies. It fuses the speech & image features and then combines speech, image, and intermediate fusion outputs. The proposed interpretability technique incorporates the divide and conquer approach to compute shapely values denoting each speech and image feature's importance. We have also constructed a large-scale dataset, IIT-R SIER dataset, consisting of speech utterances, corresponding images, and class labels, i.e., 'anger,' 'happy,' 'hate,' and 'sad.' The proposed system has achieved 83.29% accuracy for emotion recognition. The enhanced performance of the proposed system advocates the importance of utilizing complementary information from multiple modalities for emotion recognition.
引用
收藏
页码:28373 / 28394
页数:22
相关论文
共 50 条
  • [1] Interpretable multimodal emotion recognition using hybrid fusion of speech and image data
    Puneet Kumar
    Sarthak Malik
    Balasubramanian Raman
    [J]. Multimedia Tools and Applications, 2024, 83 : 28373 - 28394
  • [2] A Hybrid Latent Space Data Fusion Method for Multimodal Emotion Recognition
    Nemati, Shahla
    Rohani, Reza
    Basiri, Mohammad Ehsan
    Abdar, Moloud
    Yen, Neil Y.
    Makarenkov, Vladimir
    [J]. IEEE ACCESS, 2019, 7 : 172948 - 172964
  • [3] Multimodal transformer augmented fusion for speech emotion recognition
    Wang, Yuanyuan
    Gu, Yu
    Yin, Yifei
    Han, Yingping
    Zhang, He
    Wang, Shuang
    Li, Chenyu
    Quan, Dou
    [J]. FRONTIERS IN NEUROROBOTICS, 2023, 17
  • [4] Multimodal emotion recognition for the fusion of speech and EEG signals
    Ma, Jianghe
    Sun, Ying
    Zhang, Xueying
    [J]. Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (01): : 143 - 150
  • [5] Speech emotion recognition using multimodal feature fusion with machine learning approach
    Sandeep Kumar Panda
    Ajay Kumar Jena
    Mohit Ranjan Panda
    Susmita Panda
    [J]. Multimedia Tools and Applications, 2023, 82 : 42763 - 42781
  • [6] Speech emotion recognition using multimodal feature fusion with machine learning approach
    Panda, Sandeep Kumar
    Jena, Ajay Kumar
    Panda, Mohit Ranjan
    Panda, Susmita
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (27) : 42763 - 42781
  • [7] HYBRID FUSION BASED APPROACH FOR MULTIMODAL EMOTION RECOGNITION WITH INSUFFICIENT LABELED DATA
    Kumar, Puneet
    Khokher, Vedanti
    Gupta, Yukti
    Raman, Balasubramanian
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 314 - 318
  • [8] Speech Emotion Recognition among Elderly Individuals using Multimodal Fusion and Transfer Learning
    Boateng, George
    Kowatsch, Tobias
    [J]. COMPANION PUBLICATON OF THE 2020 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION (ICMI '20 COMPANION), 2020, : 12 - 16
  • [9] MSER: Multimodal speech emotion recognition using cross-attention with deep fusion
    Khan, Mustaqeem
    Gueaieb, Wail
    El Saddik, Abdulmotaleb
    Kwon, Soonil
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245
  • [10] Emotion Recognition Based on Feedback Weighted Fusion of Multimodal Emotion Data
    Wei, Wei
    Jia, Qingxuan
    Feng, Yongli
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (IEEE ROBIO 2017), 2017, : 1682 - 1687