Uncertainty-Based Learning of a Lightweight Model for Multimodal Emotion Recognition

被引：0

作者：

Radoi, Anamaria ^{[1
]}

Cioroiu, George ^{[1
]}

机构：

[1] NUST Politehn Bucharest, Dept Appl Elect & Informat Engn, Bucharest 060042, Romania

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Emotion recognition; Visualization; Feature extraction; Training; Computer architecture; Data mining; Transformers; Convolutional neural networks; Entropy; Uncertainty; entropy; multimodal emotion recognition; uncertainty-based learning; MTCNN; CREMA-D; RAVDESS; FACIAL EXPRESSION; NEURAL-NETWORKS; REPRESENTATIONS;

D O I：

10.1109/ACCESS.2024.3450674

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Emotion recognition is a key research topic in the Affective Computing domain, with implications in marketing, human-robot interaction, and health domains. The continuous technological advances in terms of sensors and the rapid development of artificial intelligence technologies led to breakthroughs and improved the interpretation of human emotions. In this paper, we propose a lightweight neural network architecture that extracts and performs the analysis of multimodal information using the same audio and visual networks across multiple temporal segments. Undoubtedly, data collection and annotation for emotion recognition tasks remain challenging aspects in terms of required expertise and effort spent. In this sense, the learning process of the proposed multimodal architecture is based on an iterative procedure that starts with a small volume of annotated samples and allows a step-by-step improvement of the system by assessing the model uncertainty in recognizing discrete emotions. Specifically, at each epoch, the learning process is guided by the most uncertainly annotated samples and integrates different modes of expressing emotions through a simple augmentation technique. The framework is tested on two publicly available multimodal datasets for emotion recognition, i.e. CREMA-D and RAVDESS, using 5-folds cross-validation. Compared to state-of-the-art methods, the achieved performance demonstrates the effectiveness of the proposed approach, with an overall accuracy of 74.2 % on CREMA-D and 76.3 % on RAVDESS. Moreover, with a small number of model parameters and a low inference time, the proposed neural network architecture represents a valid candidate for the integration on platforms with limited memory and computational resources.

引用

页码：120362 / 120374

页数：13

共 50 条

[31] Lightweight Deep Learning Framework for Speech Emotion Recognition
Akinpelu, Samson
Viriri, Serestina
Adegun, Adekanmi
IEEE ACCESS, 2023, 11 : 77086 - 77098
[32] Speech emotion recognition based on an improved brain emotion learning model
Liu, Zhen-Tao
Xie, Qiao
Wu, Min
Cao, Wei-Hua
Mei, Ying
Mao, Jun-Wei
NEUROCOMPUTING, 2018, 309 : 145 - 156
[33] An uncertainty-based model of the effects of fixation on choice
Li, Zhi-Wei
Ma, Wei Ji
PLOS COMPUTATIONAL BIOLOGY, 2021, 17 (08)
[34] Uncertainty-based bootstrapped optimization for offline reinforcement learning
Li, Tianyi
Yang, Genke
Chu, Jian
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024,
[35] Emotion Recognition and Classification of Film Reviews Based on Deep Learning and Multimodal Fusion
Na, Risu
Sun, Ning
WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
[36] Metric Learning-Based Multimodal Audio-Visual Emotion Recognition
Ghaleb, Esam
Popa, Mirela
Asteriadis, Stylianos
IEEE MULTIMEDIA, 2020, 27 (01) : 37 - 48
[37] Role of calibration in uncertainty-based referral for deep learning
Zhang, Ruotao
Gatsonis, Constantine
Steingrimsson, Jon Arni
STATISTICAL METHODS IN MEDICAL RESEARCH, 2023, 32 (05) : 927 - 943
[38] A multimodal fusion emotion recognition method based on multitask learning and attention mechanism
Xie, Jinbao
Wang, Jiyu
Wang, Qingyan
Yang, Dali
Gu, Jinming
Tang, Yongqiang
Varatnitski, Yury I.
NEUROCOMPUTING, 2023, 556
[39] Unlocking the Power of Multimodal Learning for Emotion Recognition in Conversation
Wang, Yunxiao
Liu, Meng
Li, Zhe
Hu, Yupeng
Luo, Xin
Nie, Liqiang
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5947 - 5955
[40] DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE
Gu, Yue
Chen, Shuhong
Marsic, Ivan
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5079 - 5083

← 1 2 3 4 5 →