Speech Emotion Recognition on Mobile Devices Based on Modulation Spectral Feature Pooling and Deep Neural Networks

被引：0

作者：

Avila, Anderson R. ^{[1
]}

Monteiro, Joao ^{[1
]}

O'Shaughneussy, Douglas ^{[1
]}

Falk, Tiago H. ^{[1
]}

机构：

[1] Univ Quebec, INRS EMT, Montreal, PQ, Canada

来源：

2017 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT) | 2017年

基金：

加拿大自然科学与工程研究理事会;

关键词：

Affective computing; Speech emotion recognition; Modulation spectrum; In-the-wild; Mobile sensing;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this study, the problem of speech emotion recognition (SER) in-the-wild is addressed. A new modulation spectral feature pooling scheme is proposed to mitigate the detrimental effects of background noise. On top of these features, two DNN-based architectures are tested for the prediction of arousal and valence emotional primitives: a multi-layer perceptron (MLP) and a recurrent neural network based on Long-Short Term Memory (LSTM). Experiments are conducted using the RECOLA dataset of spontaneous interactions. In order to simulate data collected in-the-wild, the clean speech files were corrupted with different levels of background noise and room impulse responses collected using a mobile device. Both stationary and non-stationary noise types (fan and babble) were considered in our experiments. Three distinct scenarios were explored: noise only, reverberation only and noise-plus-reverberation. Experimental results have shown that, in most of the scenarios, the proposed SER system achieved better performance in terms of concordance correlation coefficients (CCC) compared to the benchmark algorithm described in the 2016 Audio/Visual Emotion Challenge. The proposed feature system also showed to be more robust when noise-plus-reverberation is considered.

引用

页码：360 / 365

页数：6

共 50 条

[31] Learning Deep Binaural Representations With Deep Convolutional Neural Networks for Spontaneous Speech Emotion Recognition
Zhang, Shiqing
Chen, Aihua
Guo, Wenping
Cui, Yueli
Zhao, Xiaoming
Liu, Limei
IEEE ACCESS, 2020, 8 : 23496 - 23505
[32] Speech Emotion Recognition Based on Feature Fusion
Shen, Qi
Chen, Guanggen
Chang, Lin
PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON MATERIALS SCIENCE, MACHINERY AND ENERGY ENGINEERING (MSMEE 2017), 2017, 123 : 1071 - 1074
[33] Fully Neural Network Based Speech Recognition on Mobile and Embedded Devices
Park, Jinhwan
Boo, Yoonho
Choi, Iksoo
Shin, Sungho
Sung, Wonyong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[34] Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition
Yongming Huang
Kexin Tian
Ao Wu
Guobao Zhang
Journal of Ambient Intelligence and Humanized Computing, 2019, 10 : 1787 - 1798
[35] Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition
Huang, Yongming
Tian, Kexin
Wu, Ao
Zhang, Guobao
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2019, 10 (05) : 1787 - 1798
[36] Automatic speech emotion recognition using modulation spectral features
Wu, Siqing
Falk, Tiago H.
Chan, Wai-Yip
SPEECH COMMUNICATION, 2011, 53 (05) : 768 - 785
[37] ELASTIC SPECTRAL DISTORTION FOR LOW RESOURCE SPEECH RECOGNITION WITH DEEP NEURAL NETWORKS
Kanda, Naoyuki
Takeda, Ryu
Obuchi, Yasunari
2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 309 - 314
[38] Face and Emotion Recognition with Neural Networks on Mobile Devices: Practical Implementation on Different Platforms
Efremova, Natalia
Patkin, Mikhail
Sokolov, Denis
2019 14TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2019), 2019, : 616 - 620
[39] Elastic spectral distortion for low resource speech recognition with deep neural networks
Kanda, Naoyuki
Takeda, Ryu
Obuchi, Yasunari
2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings, 2013, : 309 - 314
[40] Speech Recognition Based on Deep Tensor Neural Network and Multifactor Feature
Shan, Yahui
Liu, Min
Zhan, Qingran
Du, Shixuan
Wang, Jing
Xie, Xiang
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 650 - 654

← 1 2 3 4 5 →