Uncertainty-Based Learning of a Lightweight Model for Multimodal Emotion Recognition

被引:0
|
作者
Radoi, Anamaria [1 ]
Cioroiu, George [1 ]
机构
[1] NUST Politehn Bucharest, Dept Appl Elect & Informat Engn, Bucharest 060042, Romania
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Emotion recognition; Visualization; Feature extraction; Training; Computer architecture; Data mining; Transformers; Convolutional neural networks; Entropy; Uncertainty; entropy; multimodal emotion recognition; uncertainty-based learning; MTCNN; CREMA-D; RAVDESS; FACIAL EXPRESSION; NEURAL-NETWORKS; REPRESENTATIONS;
D O I
10.1109/ACCESS.2024.3450674
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion recognition is a key research topic in the Affective Computing domain, with implications in marketing, human-robot interaction, and health domains. The continuous technological advances in terms of sensors and the rapid development of artificial intelligence technologies led to breakthroughs and improved the interpretation of human emotions. In this paper, we propose a lightweight neural network architecture that extracts and performs the analysis of multimodal information using the same audio and visual networks across multiple temporal segments. Undoubtedly, data collection and annotation for emotion recognition tasks remain challenging aspects in terms of required expertise and effort spent. In this sense, the learning process of the proposed multimodal architecture is based on an iterative procedure that starts with a small volume of annotated samples and allows a step-by-step improvement of the system by assessing the model uncertainty in recognizing discrete emotions. Specifically, at each epoch, the learning process is guided by the most uncertainly annotated samples and integrates different modes of expressing emotions through a simple augmentation technique. The framework is tested on two publicly available multimodal datasets for emotion recognition, i.e. CREMA-D and RAVDESS, using 5-folds cross-validation. Compared to state-of-the-art methods, the achieved performance demonstrates the effectiveness of the proposed approach, with an overall accuracy of 74.2 % on CREMA-D and 76.3 % on RAVDESS. Moreover, with a small number of model parameters and a low inference time, the proposed neural network architecture represents a valid candidate for the integration on platforms with limited memory and computational resources.
引用
收藏
页码:120362 / 120374
页数:13
相关论文
共 50 条
  • [31] Lightweight Deep Learning Framework for Speech Emotion Recognition
    Akinpelu, Samson
    Viriri, Serestina
    Adegun, Adekanmi
    IEEE ACCESS, 2023, 11 : 77086 - 77098
  • [32] Speech emotion recognition based on an improved brain emotion learning model
    Liu, Zhen-Tao
    Xie, Qiao
    Wu, Min
    Cao, Wei-Hua
    Mei, Ying
    Mao, Jun-Wei
    NEUROCOMPUTING, 2018, 309 : 145 - 156
  • [33] An uncertainty-based model of the effects of fixation on choice
    Li, Zhi-Wei
    Ma, Wei Ji
    PLOS COMPUTATIONAL BIOLOGY, 2021, 17 (08)
  • [34] Uncertainty-based bootstrapped optimization for offline reinforcement learning
    Li, Tianyi
    Yang, Genke
    Chu, Jian
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024,
  • [35] Emotion Recognition and Classification of Film Reviews Based on Deep Learning and Multimodal Fusion
    Na, Risu
    Sun, Ning
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [36] Metric Learning-Based Multimodal Audio-Visual Emotion Recognition
    Ghaleb, Esam
    Popa, Mirela
    Asteriadis, Stylianos
    IEEE MULTIMEDIA, 2020, 27 (01) : 37 - 48
  • [37] Role of calibration in uncertainty-based referral for deep learning
    Zhang, Ruotao
    Gatsonis, Constantine
    Steingrimsson, Jon Arni
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2023, 32 (05) : 927 - 943
  • [38] A multimodal fusion emotion recognition method based on multitask learning and attention mechanism
    Xie, Jinbao
    Wang, Jiyu
    Wang, Qingyan
    Yang, Dali
    Gu, Jinming
    Tang, Yongqiang
    Varatnitski, Yury I.
    NEUROCOMPUTING, 2023, 556
  • [39] Unlocking the Power of Multimodal Learning for Emotion Recognition in Conversation
    Wang, Yunxiao
    Liu, Meng
    Li, Zhe
    Hu, Yupeng
    Luo, Xin
    Nie, Liqiang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5947 - 5955
  • [40] DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE
    Gu, Yue
    Chen, Shuhong
    Marsic, Ivan
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5079 - 5083