Interpretable multimodal emotion recognition using hybrid fusion of speech and image data

被引:3
|
作者
Kumar, Puneet [1 ]
Malik, Sarthak [2 ]
Raman, Balasubramanian [1 ]
机构
[1] Indian Inst Technol Roorkee, Dept Comp Sci & Engn, Roorkee, India
[2] Indian Inst Technol Roorkee, Dept Elect Engn, Roorkee, India
关键词
Affective computing; Interpretable AI; Multimodal analysis; Information fusion; Speech and image processing;
D O I
10.1007/s11042-023-16443-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a multimodal emotion recognition system based on hybrid fusion that classifies the emotions depicted by speech utterances and corresponding images into discrete classes. A new interpretability technique has been developed to identify the important speech and image features leading to the prediction of particular emotion classes. The proposed system's architecture has been determined through intensive ablation studies. It fuses the speech & image features and then combines speech, image, and intermediate fusion outputs. The proposed interpretability technique incorporates the divide and conquer approach to compute shapely values denoting each speech and image feature's importance. We have also constructed a large-scale dataset, IIT-R SIER dataset, consisting of speech utterances, corresponding images, and class labels, i.e., 'anger,' 'happy,' 'hate,' and 'sad.' The proposed system has achieved 83.29% accuracy for emotion recognition. The enhanced performance of the proposed system advocates the importance of utilizing complementary information from multiple modalities for emotion recognition.
引用
收藏
页码:28373 / 28394
页数:22
相关论文
共 50 条
  • [31] Towards the explainability of Multimodal Speech Emotion Recognition
    Kumar, Puneet
    Kaushik, Vishesh
    Raman, Balasubramanian
    [J]. INTERSPEECH 2021, 2021, : 1748 - 1752
  • [32] A novel signal to image transformation and feature level fusion for multimodal emotion recognition
    Yilmaz, Bahar Hatipoglu
    Kose, Cemal
    [J]. BIOMEDICAL ENGINEERING-BIOMEDIZINISCHE TECHNIK, 2021, 66 (04): : 353 - 362
  • [33] Feature Fusion Algorithm for Multimodal Emotion Recognition from Speech and Facial Expression Signal
    Han Zhiyan
    Wang Jian
    [J]. INTERNATIONAL SEMINAR ON APPLIED PHYSICS, OPTOELECTRONICS AND PHOTONICS (APOP 2016), 2016, 61
  • [34] A Multimodal Facial Emotion Recognition Framework through the Fusion of Speech with Visible and Infrared Images
    Siddiqui, Mohammad Faridul Haque
    Javaid, Ahmad Y.
    [J]. MULTIMODAL TECHNOLOGIES AND INTERACTION, 2020, 4 (03) : 1 - 21
  • [35] Multimodal fusion: A study on speech-text emotion recognition with the integration of deep learning
    Shang, Yanan
    Fu, Tianqi
    [J]. INTELLIGENT SYSTEMS WITH APPLICATIONS, 2024, 24
  • [36] MULTIMODAL MEDICAL IMAGE FUSION USING HYBRID DOMAINS
    Naidu, A. Rajesh
    Bhavana, D.
    [J]. SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2022, 23 (04): : 225 - 232
  • [37] Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data
    Lee, Chan Woo
    Song, Kyu Ye
    Jeong, Jihoon
    Choi, Woo Yong
    [J]. FIRST GRAND CHALLENGE AND WORKSHOP ON HUMAN MULTIMODAL LANGUAGE (CHALLENGE-HML), 2018, : 28 - 34
  • [38] Multimodal emotion recognition algorithm based on edge network emotion element compensation and data fusion
    Wang, Yu
    [J]. PERSONAL AND UBIQUITOUS COMPUTING, 2019, 23 (3-4) : 383 - 392
  • [39] Multimodal emotion recognition algorithm based on edge network emotion element compensation and data fusion
    Yu Wang
    [J]. Personal and Ubiquitous Computing, 2019, 23 : 383 - 392
  • [40] Multimodal Emotion Recognition Based on Feature Fusion
    Xu, Yurui
    Wu, Xiao
    Su, Hang
    Liu, Xiaorui
    [J]. 2022 INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2022), 2022, : 7 - 11