Speech Emotion Recognition on Mobile Devices Based on Modulation Spectral Feature Pooling and Deep Neural Networks

被引：0

作者：

Avila, Anderson R. ^{[1
]}

Monteiro, Joao ^{[1
]}

O'Shaughneussy, Douglas ^{[1
]}

Falk, Tiago H. ^{[1
]}

机构：

[1] Univ Quebec, INRS EMT, Montreal, PQ, Canada

来源：

2017 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT) | 2017年

基金：

加拿大自然科学与工程研究理事会;

关键词：

Affective computing; Speech emotion recognition; Modulation spectrum; In-the-wild; Mobile sensing;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this study, the problem of speech emotion recognition (SER) in-the-wild is addressed. A new modulation spectral feature pooling scheme is proposed to mitigate the detrimental effects of background noise. On top of these features, two DNN-based architectures are tested for the prediction of arousal and valence emotional primitives: a multi-layer perceptron (MLP) and a recurrent neural network based on Long-Short Term Memory (LSTM). Experiments are conducted using the RECOLA dataset of spontaneous interactions. In order to simulate data collected in-the-wild, the clean speech files were corrupted with different levels of background noise and room impulse responses collected using a mobile device. Both stationary and non-stationary noise types (fan and babble) were considered in our experiments. Three distinct scenarios were explored: noise only, reverberation only and noise-plus-reverberation. Experimental results have shown that, in most of the scenarios, the proposed SER system achieved better performance in terms of concordance correlation coefficients (CCC) compared to the benchmark algorithm described in the 2016 Audio/Visual Emotion Challenge. The proposed feature system also showed to be more robust when noise-plus-reverberation is considered.

引用

页码：360 / 365

页数：6

共 50 条

[21] Convolution neural network with multiple pooling strategies for speech emotion recognition
Jiang, Pengxu
Zou, Cairong
2022 6TH INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND INTELLIGENT CONTROL, ISCSIC, 2022, : 89 - 92
[22] Emotion recognition in speech using neural networks
Nicholson, J
Takahashi, K
Nakatsu, R
AFFECTIVE MINDS, 2000, : 215 - 220
[23] Emotion recognition in speech using neural networks
Nicholson, J
Takahashi, K
Nakatsu, R
NEURAL COMPUTING & APPLICATIONS, 2000, 9 (04): : 290 - 296
[24] Emotion Recognition in Speech Using Neural Networks
J. Nicholson
K. Takahashi
R. Nakatsu
Neural Computing & Applications, 2000, 9 : 290 - 296
[25] END-TO-END SPEECH EMOTION RECOGNITION USING DEEP NEURAL NETWORKS
Tzirakis, Panagiotis
Zhang, Jiehao
Schuller, Bjoern W.
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5089 - 5093
[26] Robust feature extraction for mobile-based speech emotion recognition system
Lee, Kang-Kue
Cho, Youn-Ho
Park, Kyu-Sik
INTELLIGENT COMPUTING IN SIGNAL PROCESSING AND PATTERN RECOGNITION, 2006, 345 : 470 - 477
[27] Modulation-based Speech Emotion Recognition with Reconstruction Error Feature Expansion
Mihalache, Serban
Burileanu, Dragos
Pop, Gheorghe
Burileanu, Corneliu
2019 10TH INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2019,
[28] SPATIOTEMPORAL ATTENTION BASED DEEP NEURAL NETWORKS FOR EMOTION RECOGNITION
Lee, Jiyoung
Kim, Sunok
Kim, Seungryong
Sohn, Kwanghoon
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 1513 - 1517
[29] Human Voice Emotion Identification Using Prosodic and Spectral Feature Extraction Based on Deep Neural Networks
Gumelar, Agustinus Bimo
Kurniawan, Afid
Sooai, Adri Gabriel
Purnomo, Mauridhi Hery
Yuniarno, Eko Mulyanto
Sugiarto, Indar
Widodo, Agung
Kristanto, Andreas Agung
Fahrudin, Tresna Maulana
2019 IEEE 7TH INTERNATIONAL CONFERENCE ON SERIOUS GAMES AND APPLICATIONS FOR HEALTH (SEGAH), 2019,
[30] An Investigation of Spectral Restoration Algorithms for Deep Neural Networks based Noise Robust Speech Recognition
Li, Bo
Tsao, Yu
Sim, Khe Chai
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3001 - +

← 1 2 3 4 5 →