A customizable framework for multimodal emotion recognition using ensemble of deep neural network models

被引:3
|
作者
Dixit, Chhavi [1 ]
Satapathy, Shashank Mouli [2 ]
机构
[1] Shell India Markets Pvt Ltd, Bengaluru 560103, Karnataka, India
[2] Vellore Inst Technol, Sch Comp Sci & Engn, Vellore 632014, Tamil Nadu, India
关键词
CNN; Cross-dataset; Deep neural network; ELMo; Multimodal emotion recognition; RNN; Stacking; FACIAL EXPRESSION; SENTIMENT ANALYSIS;
D O I
10.1007/s00530-023-01188-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multimodal emotion recognition of videos of human oration, commonly called opinion videos, has a wide scope of applications across all domains. Here, the speakers express their views or opinions about various topics. This field is being researched by many with the aim of introducing accurate and efficient architectures for the same. This study also carries the same objective while exploring novel concepts in the field of emotion recognition. The proposed framework uses cross-dataset training and testing, so that the resultant architecture and models are unrestricted by the domain of input. It uses benchmark datasets and ensemble learning to make sure that even if the individual models are slightly biased, they can be countered by the learnings of the other models. Therefore, to achieve this objective with the mentioned novelties, three benchmark datasets, ISEAR, RAVDESS, and FER-2013, are used to train independent models for each of the three modalities of text, audio, and images. Another dataset is used in addition to the ISEAR dataset to train the text model. They are then combined and tested on the benchmark multimodal dataset of CMU-MOSEI. For the text analysis model, ELMo embedding and RNN are used, for audio, a simple DNN is used and for image emotion recognition, a 2D CNN is used after pre-processing. They are aggregated using the stacking technique for the final result. The complete architecture can be used as a partially pre-trained algorithm for the prediction of individual modalities, and partially trainable for stacking the results to get efficient emotion prediction based on input quality. The accuracy obtained on the CMU-MOSEI data set is 86.60% and the F1-score for the same is 0.84.
引用
收藏
页码:3151 / 3168
页数:18
相关论文
共 50 条
  • [1] A customizable framework for multimodal emotion recognition using ensemble of deep neural network models
    Chhavi Dixit
    Shashank Mouli Satapathy
    [J]. Multimedia Systems, 2023, 29 : 3151 - 3168
  • [2] Multimodal Emotion Recognition Based on Ensemble Convolutional Neural Network
    Huang, Haiping
    Hu, Zhenchao
    Wang, Wenming
    Wu, Min
    [J]. IEEE ACCESS, 2020, 8 : 3265 - 3271
  • [3] Multimodal Emotion Recognition Using Deep Neural Networks
    Tang, Hao
    Liu, Wei
    Zheng, Wei-Long
    Lu, Bao-Liang
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 811 - 819
  • [4] Emotion Recognition on Multimodal with Deep Learning and Ensemble
    Dharma, David Adi
    Zahra, Amalia
    [J]. International Journal of Advanced Computer Science and Applications, 2022, 13 (12): : 656 - 663
  • [5] Emotion Recognition on Multimodal with Deep Learning and Ensemble
    Dharma, David Adi
    Zahra, Amalia
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 656 - 663
  • [6] Ensemble Convolution Neural Network for Robust Video Emotion Recognition Using Deep Semantics
    Smitha, E.S.
    Sendhilkumar, S.
    Mahalakshmi, G.S.
    [J]. Scientific Programming, 2023, 2023
  • [7] Acoustic Emotion Recognition using Deep Neural Network
    Niu, Jianwei
    Qian, Yanmin
    Yu, Kai
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 128 - 132
  • [8] E-MFNN: an emotion-multimodal fusion neural network framework for emotion recognition
    Guo, Zhuen
    Yang, Mingqing
    Lin, Li
    Li, Jisong
    Zhang, Shuyue
    He, Qianbo
    Gao, Jiaqi
    Meng, Heling
    Chen, Xinran
    Tao, Yuehao
    Yang, Chen
    [J]. PEERJ COMPUTER SCIENCE, 2024, 10
  • [9] E-MFNN: an emotion-multimodal fusion neural network framework for emotion recognition
    Guo, Zhuen
    Yang, Mingqing
    Lin, Li
    Li, Jisong
    Zhang, Shuyue
    He, Qianbo
    Gao, Jiaqi
    Meng, Heling
    Chen, Xinran
    Tao, Yuehao
    Yang, Chen
    [J]. PEERJ COMPUTER SCIENCE, 2024, 10
  • [10] Multimodal Deep Convolutional Neural Network for Audio-Visual Emotion Recognition
    Zhang, Shiqing
    Zhang, Shiliang
    Huang, Tiejun
    Gao, Wen
    [J]. ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 281 - 284