Speech Emotion Recognition among Elderly Individuals using Multimodal Fusion and Transfer Learning

被引:8
|
作者
Boateng, George [1 ]
Kowatsch, Tobias [1 ,2 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] Univ St Gallen, St Gallen, Switzerland
关键词
Speech emotion recognition; Affective computing; Transfer learning; Computational paralinguistics; Elderly individuals; Multimodal fusion; Deep learning; CNN; LSTM; BERT; SBERT; Support vector machine;
D O I
10.1145/3395035.3425255
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recognizing the emotions of the elderly is important as it could give an insight into their mental health. Emotion recognition systems that work well on the elderly could be used to assess their emotions in places such as nursing homes and could inform the development of various activities and interventions to improve their mental health. However, several emotion recognition systems are developed using data from younger adults. In this work, we train machine learning models to recognize the emotions of elderly individuals via performing a 3-class classification of valence and arousal as part of the INTERSPEECH 2020 Computational Paralinguistics Challenge (COMPARE). We used speech data from 87 participants who gave spontaneous personal narratives. We leveraged a transfer learning approach in which we used pretrained CNN and BERT models to extract acoustic and linguistic features respectively and fed them into separate machine learning models. Also, we fused these two modalities in a multimodal approach. Our best model used a linguistic approach and outperformed the official competition of unweighted average recall (UAR) baseline for valence by 8.8% and the mean of valence and arousal by 3.2%. We also showed that feature engineering is not necessary as transfer learning without fine-timing performs as well or better and could be leveraged for the task of recognizing the emotions of elderly individuals. This work is a step towards better recognition of the emotions of the elderly which could eventually inform the development of interventions to manage their mental health.
引用
收藏
页码:12 / 16
页数:5
相关论文
共 50 条
  • [1] Speech emotion recognition using multimodal feature fusion with machine learning approach
    Sandeep Kumar Panda
    Ajay Kumar Jena
    Mohit Ranjan Panda
    Susmita Panda
    [J]. Multimedia Tools and Applications, 2023, 82 : 42763 - 42781
  • [2] Speech emotion recognition using multimodal feature fusion with machine learning approach
    Panda, Sandeep Kumar
    Jena, Ajay Kumar
    Panda, Mohit Ranjan
    Panda, Susmita
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (27) : 42763 - 42781
  • [3] Speech Emotion Recognition Using Transfer Learning
    Song, Peng
    Jin, Yun
    Zhao, Li
    Xin, Minghai
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (09): : 2530 - 2532
  • [4] Multimodal transformer augmented fusion for speech emotion recognition
    Wang, Yuanyuan
    Gu, Yu
    Yin, Yifei
    Han, Yingping
    Zhang, He
    Wang, Shuang
    Li, Chenyu
    Quan, Dou
    [J]. FRONTIERS IN NEUROROBOTICS, 2023, 17
  • [5] Multimodal emotion recognition for the fusion of speech and EEG signals
    Ma, Jianghe
    Sun, Ying
    Zhang, Xueying
    [J]. Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (01): : 143 - 150
  • [6] Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning
    Luna-Jimenez, Cristina
    Griol, David
    Callejas, Zoraida
    Kleinlein, Ricardo
    Montero, Juan M.
    Fernandez-Martinez, Fernando
    [J]. SENSORS, 2021, 21 (22)
  • [7] Transfer Learning for Speech Emotion Recognition
    Han Zhijie
    Zhao, Huijuan
    Wang, Ruchuan
    [J]. 2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 96 - 99
  • [8] Interpretable multimodal emotion recognition using hybrid fusion of speech and image data
    Kumar, Puneet
    Malik, Sarthak
    Raman, Balasubramanian
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (10) : 28373 - 28394
  • [9] Interpretable multimodal emotion recognition using hybrid fusion of speech and image data
    Puneet Kumar
    Sarthak Malik
    Balasubramanian Raman
    [J]. Multimedia Tools and Applications, 2024, 83 : 28373 - 28394
  • [10] Multimodal fusion: A study on speech-text emotion recognition with the integration of deep learning
    Shang, Yanan
    Fu, Tianqi
    [J]. INTELLIGENT SYSTEMS WITH APPLICATIONS, 2024, 24