Uncertainty-Based Learning of a Lightweight Model for Multimodal Emotion Recognition

被引:0
|
作者
Radoi, Anamaria [1 ]
Cioroiu, George [1 ]
机构
[1] NUST Politehn Bucharest, Dept Appl Elect & Informat Engn, Bucharest 060042, Romania
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Emotion recognition; Visualization; Feature extraction; Training; Computer architecture; Data mining; Transformers; Convolutional neural networks; Entropy; Uncertainty; entropy; multimodal emotion recognition; uncertainty-based learning; MTCNN; CREMA-D; RAVDESS; FACIAL EXPRESSION; NEURAL-NETWORKS; REPRESENTATIONS;
D O I
10.1109/ACCESS.2024.3450674
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion recognition is a key research topic in the Affective Computing domain, with implications in marketing, human-robot interaction, and health domains. The continuous technological advances in terms of sensors and the rapid development of artificial intelligence technologies led to breakthroughs and improved the interpretation of human emotions. In this paper, we propose a lightweight neural network architecture that extracts and performs the analysis of multimodal information using the same audio and visual networks across multiple temporal segments. Undoubtedly, data collection and annotation for emotion recognition tasks remain challenging aspects in terms of required expertise and effort spent. In this sense, the learning process of the proposed multimodal architecture is based on an iterative procedure that starts with a small volume of annotated samples and allows a step-by-step improvement of the system by assessing the model uncertainty in recognizing discrete emotions. Specifically, at each epoch, the learning process is guided by the most uncertainly annotated samples and integrates different modes of expressing emotions through a simple augmentation technique. The framework is tested on two publicly available multimodal datasets for emotion recognition, i.e. CREMA-D and RAVDESS, using 5-folds cross-validation. Compared to state-of-the-art methods, the achieved performance demonstrates the effectiveness of the proposed approach, with an overall accuracy of 74.2 % on CREMA-D and 76.3 % on RAVDESS. Moreover, with a small number of model parameters and a low inference time, the proposed neural network architecture represents a valid candidate for the integration on platforms with limited memory and computational resources.
引用
收藏
页码:120362 / 120374
页数:13
相关论文
共 50 条
  • [21] Multimodal emotion recognition based on manifold learning and convolution neural network
    Zhang, Yong
    Cheng, Cheng
    Zhang, YiDie
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (23) : 33253 - 33268
  • [22] Multimodal emotion recognition based on manifold learning and convolution neural network
    Yong Zhang
    Cheng Cheng
    YiDie Zhang
    Multimedia Tools and Applications, 2022, 81 : 33253 - 33268
  • [23] Improved Multimodal Emotion Recognition for Better Game-Based Learning
    Bahreini, Kiavash
    Nadolski, Rob
    Westera, Wim
    GAMES AND LEARNING ALLIANCE, GALA 2014, 2015, 9221 : 107 - 120
  • [24] EEG-Based Multimodal Emotion Recognition: A Machine Learning Perspective
    Liu, Huan
    Lou, Tianyu
    Zhang, Yuzhe
    Wu, Yixiao
    Xiao, Yang
    Jensen, Christian S.
    Zhang, Dalin
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 29
  • [25] An Adaptive Framework of Multimodal Emotion Recognition Based on Collaborative Discriminative Learning
    Wang, Yadi
    Guo, Xiaoding
    Zhang, Yibo
    Ren, Yiyuan
    Huang, Wendi
    Liu, Zunyan
    Feng, Yuming
    Dai, Xiangguang
    Zhang, Wei
    Che, Hangjun
    2023 15TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE, ICACI, 2023,
  • [26] ULDC: uncertainty-based learning for deep clusteringULDC: uncertainty-based learning for deep clusteringL. Chang et al.
    Luyao Chang
    Xinzheng Niu
    Zhenghua Li
    Zhiheng Zhang
    Shenshen Li
    Philippe Fournier-Viger
    Applied Intelligence, 2025, 55 (3)
  • [27] Emotion Recognition Based on Multimodal Information
    Zeng, Zhihong
    Pantic, Maja
    Huang, Thomas S.
    AFFECTIVE INFORMATION PROCESSING, 2009, : 241 - +
  • [28] Regularized uncertainty-based multi-task learning model for food analysis
    Aguilar, Eduardo
    Bolanos, Marc
    Radeva, Petia
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 60 : 360 - 370
  • [29] Mathematical representation of emotion using multimodal recognition model with deep multitask learning
    Harata S.
    Sakuma T.
    Kato S.
    Harata, Seiichi (harata@katolab.nitech.ac.jp), 1600, Institute of Electrical Engineers of Japan (140): : 1343 - 1351
  • [30] Comparing Recognition Performance and Robustness of Multimodal Deep Learning Models for Multimodal Emotion Recognition
    Liu, Wei
    Qiu, Jie-Lin
    Zheng, Wei-Long
    Lu, Bao-Liang
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (02) : 715 - 729