Speech Emotion Recognition on Mobile Devices Based on Modulation Spectral Feature Pooling and Deep Neural Networks

被引：0

作者：

Avila, Anderson R. ^{[1
]}

Monteiro, Joao ^{[1
]}

O'Shaughneussy, Douglas ^{[1
]}

Falk, Tiago H. ^{[1
]}

机构：

[1] Univ Quebec, INRS EMT, Montreal, PQ, Canada

来源：

2017 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT) | 2017年

基金：

加拿大自然科学与工程研究理事会;

关键词：

Affective computing; Speech emotion recognition; Modulation spectrum; In-the-wild; Mobile sensing;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this study, the problem of speech emotion recognition (SER) in-the-wild is addressed. A new modulation spectral feature pooling scheme is proposed to mitigate the detrimental effects of background noise. On top of these features, two DNN-based architectures are tested for the prediction of arousal and valence emotional primitives: a multi-layer perceptron (MLP) and a recurrent neural network based on Long-Short Term Memory (LSTM). Experiments are conducted using the RECOLA dataset of spontaneous interactions. In order to simulate data collected in-the-wild, the clean speech files were corrupted with different levels of background noise and room impulse responses collected using a mobile device. Both stationary and non-stationary noise types (fan and babble) were considered in our experiments. Three distinct scenarios were explored: noise only, reverberation only and noise-plus-reverberation. Experimental results have shown that, in most of the scenarios, the proposed SER system achieved better performance in terms of concordance correlation coefficients (CCC) compared to the benchmark algorithm described in the 2016 Audio/Visual Emotion Challenge. The proposed feature system also showed to be more robust when noise-plus-reverberation is considered.

引用

页码：360 / 365

页数：6

共 50 条

[1] Modulation spectral features for speech emotion recognition using deep neural networks
Singh, Premjeet
Sahidullah, Md
Saha, Goutam
SPEECH COMMUNICATION, 2023, 146 : 53 - 69
[2] Deep Convolutional Neural Networks for Feature Extraction in Speech Emotion Recognition
Heracleous, Panikos
Mohammad, Yasser
Yoneyama, Akio
HUMAN-COMPUTER INTERACTION. RECOGNITION AND INTERACTION TECHNOLOGIES, HCI 2019, PT II, 2019, 11567 : 117 - 132
[3] Feature Pooling of Modulation Spectrum Features for Improved Speech Emotion Recognition in the Wild
Avila, Anderson R.
Akhtar, Zahid
Santos, Joao F.
O'Shaughnessy, Douglas
Falk, Tiago H.
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2021, 12 (01) : 177 - 188
[4] Improvement on Speech Emotion Recognition Based on Deep Convolutional Neural Networks
Niu, Yafeng
Zou, Dongsheng
Niu, Yadong
He, Zhongshi
Tan, Hua
PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON COMPUTING AND ARTIFICIAL INTELLIGENCE (ICCAI 2018), 2018, : 13 - 18
[5] Speech emotion recognition with deep convolutional neural networks
Issa, Dias
Demirci, M. Fatih
Yazici, Adnan
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
[6] Dimensional Emotion Recognition from Speech Using Modulation Spectral Features and Recurrent Neural Networks
Peng, Zhichao
Zhu, Zhi
Unoki, Masashi
Dang, Jianwu
Akagi, Masato
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 524 - 528
[7] Deep Neural Network Based Spectral Feature Mapping for Robust Speech Recognition
Han, Kun
He, Yanzhang
Bagchi, Deblin
Fosler-Lussier, Eric
Wang, DeLiang
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2484 - 2488
[8] An Experimental Study of Speech Emotion Recognition Based on Deep Convolutional Neural Networks
Zheng, W. Q.
Yu, J. S.
Zou, Y. X.
2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 827 - 831
[9] Speech Emotion Recognition based on Gaussian Mixture Models and Deep Neural Networks
Tashev, Ivan J.
Wang, Zhong-Qiu
Godin, Keith
2017 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA), 2017,
[10] FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition
Dossou, Bonaventure F. P.
Gbenou, Yeno K. S.
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3526 - 3531

← 1 2 3 4 5 →