Interpretable multimodal emotion recognition using hybrid fusion of speech and image data

被引:3
|
作者
Kumar, Puneet [1 ]
Malik, Sarthak [2 ]
Raman, Balasubramanian [1 ]
机构
[1] Indian Inst Technol Roorkee, Dept Comp Sci & Engn, Roorkee, India
[2] Indian Inst Technol Roorkee, Dept Elect Engn, Roorkee, India
关键词
Affective computing; Interpretable AI; Multimodal analysis; Information fusion; Speech and image processing;
D O I
10.1007/s11042-023-16443-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a multimodal emotion recognition system based on hybrid fusion that classifies the emotions depicted by speech utterances and corresponding images into discrete classes. A new interpretability technique has been developed to identify the important speech and image features leading to the prediction of particular emotion classes. The proposed system's architecture has been determined through intensive ablation studies. It fuses the speech & image features and then combines speech, image, and intermediate fusion outputs. The proposed interpretability technique incorporates the divide and conquer approach to compute shapely values denoting each speech and image feature's importance. We have also constructed a large-scale dataset, IIT-R SIER dataset, consisting of speech utterances, corresponding images, and class labels, i.e., 'anger,' 'happy,' 'hate,' and 'sad.' The proposed system has achieved 83.29% accuracy for emotion recognition. The enhanced performance of the proposed system advocates the importance of utilizing complementary information from multiple modalities for emotion recognition.
引用
收藏
页码:28373 / 28394
页数:22
相关论文
共 50 条
  • [41] MULTIMODAL TRANSFORMER FUSION FOR CONTINUOUS EMOTION RECOGNITION
    Huang, Jian
    Tao, Jianhua
    Liu, Bin
    Lian, Zheng
    Niu, Mingyue
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3507 - 3511
  • [42] Fusion with Hierarchical Graphs for Multimodal Emotion Recognition
    Tang, Shuyun
    Luo, Zhaojie
    Nan, Guoshun
    Baba, Jun
    Yoshikawa, Yuichiro
    Ishiguro, Hiroshi
    [J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1288 - 1296
  • [43] Multi-Label Emotion Recognition of Korean Speech Data Using Deep Fusion Models
    Park, Seoin
    Jeon, Byeonghoon
    Lee, Seunghyun
    Yoon, Janghyeok
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (17):
  • [44] Transformer-Based Multilingual Speech Emotion Recognition Using Data Augmentation and Feature Fusion
    Al-onazi, Badriyya B.
    Nauman, Muhammad Asif
    Jahangir, Rashid
    Malik, Muhmmad Mohsin
    Alkhammash, Eman H.
    Elshewey, Ahmed M.
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (18):
  • [45] Data Augmentation using GANs for Speech Emotion Recognition
    Chatziagapi, Aggelina
    Paraskevopoulos, Georgios
    Sgouropoulos, Dimitris
    Pantazopoulos, Georgios
    Nikandrou, Malvina
    Giannakopoulos, Theodoros
    Katsamanis, Athanasios
    Potamianos, Alexandros
    Narayanan, Shrikanth
    [J]. INTERSPEECH 2019, 2019, : 171 - 175
  • [46] Multimodal Emotion Recognition Using a Hierarchical Fusion Convolutional Neural Network
    Zhang, Yong
    Cheng, Cheng
    Zhang, Yidie
    [J]. IEEE ACCESS, 2021, 9 : 7943 - 7951
  • [47] Speech image data mining algorithm based on multimodal decision fusion
    Lu, Cong
    Wang, Danxing
    Zhang, Daquan
    [J]. 2023 2ND ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING, CACML 2023, 2023, : 19 - 24
  • [48] A deep interpretable representation learning method for speech emotion recognition
    Jing, Erkang
    Liu, Yezheng
    Chai, Yidong
    Sun, Jianshan
    Samtani, Sagar
    Jiang, Yuanchun
    Qian, Yang
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (06)
  • [49] Speech Emotion Recognition using MFCC and Hybrid Neural Networks
    Badr, Youakim
    Mukherjee, Partha
    Thumati, Sindhu
    [J]. PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL INTELLIGENCE (IJCCI), 2021, : 366 - 373
  • [50] Speech Emotion Recognition Using Hybrid Generative and Discriminative Models
    Huang, Yongming
    Zhang, Guobao
    Dong, Fei
    Li, Yue
    Da, Feipeng
    [J]. PRZEGLAD ELEKTROTECHNICZNY, 2012, 88 (3B): : 105 - 108