Speech Emotion Recognition among Elderly Individuals using Multimodal Fusion and Transfer Learning

被引：8

作者：

Boateng, George ^{[1
]}

Kowatsch, Tobias ^{[1
,2
]}

机构：

[1] Swiss Fed Inst Technol, Zurich, Switzerland

[2] Univ St Gallen, St Gallen, Switzerland

来源：

COMPANION PUBLICATON OF THE 2020 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION (ICMI '20 COMPANION) | 2020年

关键词：

Speech emotion recognition; Affective computing; Transfer learning; Computational paralinguistics; Elderly individuals; Multimodal fusion; Deep learning; CNN; LSTM; BERT; SBERT; Support vector machine;

D O I：

10.1145/3395035.3425255

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recognizing the emotions of the elderly is important as it could give an insight into their mental health. Emotion recognition systems that work well on the elderly could be used to assess their emotions in places such as nursing homes and could inform the development of various activities and interventions to improve their mental health. However, several emotion recognition systems are developed using data from younger adults. In this work, we train machine learning models to recognize the emotions of elderly individuals via performing a 3-class classification of valence and arousal as part of the INTERSPEECH 2020 Computational Paralinguistics Challenge (COMPARE). We used speech data from 87 participants who gave spontaneous personal narratives. We leveraged a transfer learning approach in which we used pretrained CNN and BERT models to extract acoustic and linguistic features respectively and fed them into separate machine learning models. Also, we fused these two modalities in a multimodal approach. Our best model used a linguistic approach and outperformed the official competition of unweighted average recall (UAR) baseline for valence by 8.8% and the mean of valence and arousal by 3.2%. We also showed that feature engineering is not necessary as transfer learning without fine-timing performs as well or better and could be leveraged for the task of recognizing the emotions of elderly individuals. This work is a step towards better recognition of the emotions of the elderly which could eventually inform the development of interventions to manage their mental health.

引用

页码：12 / 16

页数：5

共 50 条

[21] Multimodal emotion recognition from facial expression and speech based on feature fusion
Tang, Guichen
Xie, Yue
Li, Ke
Liang, Ruiyu
Zhao, Li
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (11) : 16359 - 16373
[22] Multimodal information fusion application to human emotion recognition from face and speech
Mansoorizadeh, Muharram
Charkari, Nasrollah Moghaddam
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2010, 49 (02) : 277 - 297
[23] Multimodal information fusion application to human emotion recognition from face and speech
Muharram Mansoorizadeh
Nasrollah Moghaddam Charkari
[J]. Multimedia Tools and Applications, 2010, 49 : 277 - 297
[24] Multimodal emotion recognition from facial expression and speech based on feature fusion
Guichen Tang
Yue Xie
Ke Li
Ruiyu Liang
Li Zhao
[J]. Multimedia Tools and Applications, 2023, 82 : 16359 - 16373
[25] Feature Fusion of Speech Emotion Recognition Based on Deep Learning
Liu, Gang
He, Wei
Jin, Bicheng
[J]. PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 193 - 197
[26] Speech Emotion Recognition Using Deep Learning
Alagusundari, N.
Anuradha, R.
[J]. ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, VOL 1, AITA 2023, 2024, 843 : 313 - 325
[27] Speech Emotion Recognition Using Deep Learning
Ahmed, Waqar
Riaz, Sana
Iftikhar, Khunsa
Konur, Savas
[J]. ARTIFICIAL INTELLIGENCE XL, AI 2023, 2023, 14381 : 191 - 197
[28] MuDERI: Multimodal Database for Emotion Recognition Among Intellectually Disabled Individuals
Shukla, Jainendra
Barreda-Angeles, Miguel
Oliver, Joan
Puig, Domenec
[J]. SOCIAL ROBOTICS, (ICSR 2016), 2016, 9979 : 264 - 273
[29] Learning deep multimodal affective features for spontaneous speech emotion recognition
Zhang, Shiqing
Tao, Xin
Chuang, Yuelong
Zhao, Xiaoming
[J]. SPEECH COMMUNICATION, 2021, 127 : 73 - 81
[30] Decision-Level Fusion Method for Emotion Recognition using Multimodal Emotion Recognition Information
Song, Kyu-Seob
Nho, Young-Hoon
Seo, Ju-Hwan
Kwon, Dong-Soo
[J]. 2018 15TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS (UR), 2018, : 472 - 476

← 1 2 3 4 5 →