Uncertainty-Based Learning of a Lightweight Model for Multimodal Emotion Recognition

被引:0
|
作者
Radoi, Anamaria [1 ]
Cioroiu, George [1 ]
机构
[1] NUST Politehn Bucharest, Dept Appl Elect & Informat Engn, Bucharest 060042, Romania
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Emotion recognition; Visualization; Feature extraction; Training; Computer architecture; Data mining; Transformers; Convolutional neural networks; Entropy; Uncertainty; entropy; multimodal emotion recognition; uncertainty-based learning; MTCNN; CREMA-D; RAVDESS; FACIAL EXPRESSION; NEURAL-NETWORKS; REPRESENTATIONS;
D O I
10.1109/ACCESS.2024.3450674
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion recognition is a key research topic in the Affective Computing domain, with implications in marketing, human-robot interaction, and health domains. The continuous technological advances in terms of sensors and the rapid development of artificial intelligence technologies led to breakthroughs and improved the interpretation of human emotions. In this paper, we propose a lightweight neural network architecture that extracts and performs the analysis of multimodal information using the same audio and visual networks across multiple temporal segments. Undoubtedly, data collection and annotation for emotion recognition tasks remain challenging aspects in terms of required expertise and effort spent. In this sense, the learning process of the proposed multimodal architecture is based on an iterative procedure that starts with a small volume of annotated samples and allows a step-by-step improvement of the system by assessing the model uncertainty in recognizing discrete emotions. Specifically, at each epoch, the learning process is guided by the most uncertainly annotated samples and integrates different modes of expressing emotions through a simple augmentation technique. The framework is tested on two publicly available multimodal datasets for emotion recognition, i.e. CREMA-D and RAVDESS, using 5-folds cross-validation. Compared to state-of-the-art methods, the achieved performance demonstrates the effectiveness of the proposed approach, with an overall accuracy of 74.2 % on CREMA-D and 76.3 % on RAVDESS. Moreover, with a small number of model parameters and a low inference time, the proposed neural network architecture represents a valid candidate for the integration on platforms with limited memory and computational resources.
引用
收藏
页码:120362 / 120374
页数:13
相关论文
共 50 条
  • [1] Uncertainty-based modulation for lifelong learning
    Brna, Andrew P.
    Brown, Ryan C.
    Connolly, Patrick M.
    Simons, Stephen B.
    Shimizu, Renee E.
    Aguilar-Simon, Mario
    NEURAL NETWORKS, 2019, 120 : 129 - 142
  • [2] A review on EEG-based multimodal learning for emotion recognition
    Pillalamarri, Rajasekhar
    Shanmugam, Udhayakumar
    ARTIFICIAL INTELLIGENCE REVIEW, 2025, 58 (05)
  • [3] An emotion recognition embedded system using a lightweight deep learning model
    Bazargani, Mehdi
    Tahmasebi, Amir
    Yazdchi, Mohammadreza
    Baharlouei, Zahra
    JOURNAL OF MEDICAL SIGNALS & SENSORS, 2023, 13 (04): : 272 - 279
  • [4] Modeling Hierarchical Uncertainty for Multimodal Emotion Recognition in Conversation
    Chen, Feiyu
    Shao, Jie
    Zhu, Anjie
    Ouyang, Deqiang
    Liu, Xueliang
    Shen, Heng Tao
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (01) : 187 - 198
  • [5] Consistency, Uncertainty or Inconsistency Detection in Multimodal Emotion Recognition
    Fantini, Alessia
    Pilato, Giovanni
    Vitale, Gianpaolo
    2023 SEVENTH IEEE INTERNATIONAL CONFERENCE ON ROBOTIC COMPUTING, IRC 2023, 2023, : 377 - 380
  • [6] A Lightweight Model Based on Separable Convolution for Speech Emotion Recognition
    Zhong, Ying
    Hu, Ying
    Huang, Hao
    Silamu, Wushour
    INTERSPEECH 2020, 2020, : 3331 - 3335
  • [7] Uncertainty-Based Rejection in Machine Learning: Implications for Model Development and Interpretability
    Barandas, Marilia
    Folgado, Duarte
    Santos, Ricardo
    Simao, Raquel
    Gamboa, Hugo
    ELECTRONICS, 2022, 11 (03)
  • [8] A Multitask learning model for multimodal sarcasm, sentiment and emotion recognition in conversations
    Zhang, Yazhou
    Wang, Jinglin
    Liu, Yaochen
    Rong, Lu
    Zheng, Qian
    Song, Dawei
    Tiwari, Prayag
    Qin, Jing
    INFORMATION FUSION, 2023, 93 : 282 - 301
  • [9] MEMOBERT: PRE-TRAINING MODEL WITH PROMPT-BASED LEARNING FOR MULTIMODAL EMOTION RECOGNITION
    Zhao, Jinming
    Li, Ruichen
    Jin, Qin
    Wang, Xinchao
    Li, Haizhou
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4703 - 4707
  • [10] An Emotion-Space Model of Multimodal Emotion Recognition
    Choe, Kyung-Il
    ADVANCED SCIENCE LETTERS, 2018, 24 (01) : 699 - 702