Speech Emotion Recognition among Elderly Individuals using Multimodal Fusion and Transfer Learning

被引:8
|
作者
Boateng, George [1 ]
Kowatsch, Tobias [1 ,2 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] Univ St Gallen, St Gallen, Switzerland
关键词
Speech emotion recognition; Affective computing; Transfer learning; Computational paralinguistics; Elderly individuals; Multimodal fusion; Deep learning; CNN; LSTM; BERT; SBERT; Support vector machine;
D O I
10.1145/3395035.3425255
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recognizing the emotions of the elderly is important as it could give an insight into their mental health. Emotion recognition systems that work well on the elderly could be used to assess their emotions in places such as nursing homes and could inform the development of various activities and interventions to improve their mental health. However, several emotion recognition systems are developed using data from younger adults. In this work, we train machine learning models to recognize the emotions of elderly individuals via performing a 3-class classification of valence and arousal as part of the INTERSPEECH 2020 Computational Paralinguistics Challenge (COMPARE). We used speech data from 87 participants who gave spontaneous personal narratives. We leveraged a transfer learning approach in which we used pretrained CNN and BERT models to extract acoustic and linguistic features respectively and fed them into separate machine learning models. Also, we fused these two modalities in a multimodal approach. Our best model used a linguistic approach and outperformed the official competition of unweighted average recall (UAR) baseline for valence by 8.8% and the mean of valence and arousal by 3.2%. We also showed that feature engineering is not necessary as transfer learning without fine-timing performs as well or better and could be leveraged for the task of recognizing the emotions of elderly individuals. This work is a step towards better recognition of the emotions of the elderly which could eventually inform the development of interventions to manage their mental health.
引用
收藏
页码:12 / 16
页数:5
相关论文
共 50 条
  • [21] Multimodal emotion recognition from facial expression and speech based on feature fusion
    Tang, Guichen
    Xie, Yue
    Li, Ke
    Liang, Ruiyu
    Zhao, Li
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (11) : 16359 - 16373
  • [22] Multimodal information fusion application to human emotion recognition from face and speech
    Mansoorizadeh, Muharram
    Charkari, Nasrollah Moghaddam
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2010, 49 (02) : 277 - 297
  • [23] Multimodal information fusion application to human emotion recognition from face and speech
    Muharram Mansoorizadeh
    Nasrollah Moghaddam Charkari
    [J]. Multimedia Tools and Applications, 2010, 49 : 277 - 297
  • [24] Multimodal emotion recognition from facial expression and speech based on feature fusion
    Guichen Tang
    Yue Xie
    Ke Li
    Ruiyu Liang
    Li Zhao
    [J]. Multimedia Tools and Applications, 2023, 82 : 16359 - 16373
  • [25] Feature Fusion of Speech Emotion Recognition Based on Deep Learning
    Liu, Gang
    He, Wei
    Jin, Bicheng
    [J]. PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 193 - 197
  • [26] Speech Emotion Recognition Using Deep Learning
    Alagusundari, N.
    Anuradha, R.
    [J]. ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, VOL 1, AITA 2023, 2024, 843 : 313 - 325
  • [27] Speech Emotion Recognition Using Deep Learning
    Ahmed, Waqar
    Riaz, Sana
    Iftikhar, Khunsa
    Konur, Savas
    [J]. ARTIFICIAL INTELLIGENCE XL, AI 2023, 2023, 14381 : 191 - 197
  • [28] MuDERI: Multimodal Database for Emotion Recognition Among Intellectually Disabled Individuals
    Shukla, Jainendra
    Barreda-Angeles, Miguel
    Oliver, Joan
    Puig, Domenec
    [J]. SOCIAL ROBOTICS, (ICSR 2016), 2016, 9979 : 264 - 273
  • [29] Learning deep multimodal affective features for spontaneous speech emotion recognition
    Zhang, Shiqing
    Tao, Xin
    Chuang, Yuelong
    Zhao, Xiaoming
    [J]. SPEECH COMMUNICATION, 2021, 127 : 73 - 81
  • [30] Decision-Level Fusion Method for Emotion Recognition using Multimodal Emotion Recognition Information
    Song, Kyu-Seob
    Nho, Young-Hoon
    Seo, Ju-Hwan
    Kwon, Dong-Soo
    [J]. 2018 15TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS (UR), 2018, : 472 - 476