Speech Emotion Recognition on Mobile Devices Based on Modulation Spectral Feature Pooling and Deep Neural Networks

被引:0
|
作者
Avila, Anderson R. [1 ]
Monteiro, Joao [1 ]
O'Shaughneussy, Douglas [1 ]
Falk, Tiago H. [1 ]
机构
[1] Univ Quebec, INRS EMT, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Affective computing; Speech emotion recognition; Modulation spectrum; In-the-wild; Mobile sensing;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this study, the problem of speech emotion recognition (SER) in-the-wild is addressed. A new modulation spectral feature pooling scheme is proposed to mitigate the detrimental effects of background noise. On top of these features, two DNN-based architectures are tested for the prediction of arousal and valence emotional primitives: a multi-layer perceptron (MLP) and a recurrent neural network based on Long-Short Term Memory (LSTM). Experiments are conducted using the RECOLA dataset of spontaneous interactions. In order to simulate data collected in-the-wild, the clean speech files were corrupted with different levels of background noise and room impulse responses collected using a mobile device. Both stationary and non-stationary noise types (fan and babble) were considered in our experiments. Three distinct scenarios were explored: noise only, reverberation only and noise-plus-reverberation. Experimental results have shown that, in most of the scenarios, the proposed SER system achieved better performance in terms of concordance correlation coefficients (CCC) compared to the benchmark algorithm described in the 2016 Audio/Visual Emotion Challenge. The proposed feature system also showed to be more robust when noise-plus-reverberation is considered.
引用
收藏
页码:360 / 365
页数:6
相关论文
共 50 条
  • [31] Learning Deep Binaural Representations With Deep Convolutional Neural Networks for Spontaneous Speech Emotion Recognition
    Zhang, Shiqing
    Chen, Aihua
    Guo, Wenping
    Cui, Yueli
    Zhao, Xiaoming
    Liu, Limei
    IEEE ACCESS, 2020, 8 : 23496 - 23505
  • [32] Speech Emotion Recognition Based on Feature Fusion
    Shen, Qi
    Chen, Guanggen
    Chang, Lin
    PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON MATERIALS SCIENCE, MACHINERY AND ENERGY ENGINEERING (MSMEE 2017), 2017, 123 : 1071 - 1074
  • [33] Fully Neural Network Based Speech Recognition on Mobile and Embedded Devices
    Park, Jinhwan
    Boo, Yoonho
    Choi, Iksoo
    Shin, Sungho
    Sung, Wonyong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [34] Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition
    Yongming Huang
    Kexin Tian
    Ao Wu
    Guobao Zhang
    Journal of Ambient Intelligence and Humanized Computing, 2019, 10 : 1787 - 1798
  • [35] Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition
    Huang, Yongming
    Tian, Kexin
    Wu, Ao
    Zhang, Guobao
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2019, 10 (05) : 1787 - 1798
  • [36] Automatic speech emotion recognition using modulation spectral features
    Wu, Siqing
    Falk, Tiago H.
    Chan, Wai-Yip
    SPEECH COMMUNICATION, 2011, 53 (05) : 768 - 785
  • [37] ELASTIC SPECTRAL DISTORTION FOR LOW RESOURCE SPEECH RECOGNITION WITH DEEP NEURAL NETWORKS
    Kanda, Naoyuki
    Takeda, Ryu
    Obuchi, Yasunari
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 309 - 314
  • [38] Face and Emotion Recognition with Neural Networks on Mobile Devices: Practical Implementation on Different Platforms
    Efremova, Natalia
    Patkin, Mikhail
    Sokolov, Denis
    2019 14TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2019), 2019, : 616 - 620
  • [39] Elastic spectral distortion for low resource speech recognition with deep neural networks
    Kanda, Naoyuki
    Takeda, Ryu
    Obuchi, Yasunari
    2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings, 2013, : 309 - 314
  • [40] Speech Recognition Based on Deep Tensor Neural Network and Multifactor Feature
    Shan, Yahui
    Liu, Min
    Zhan, Qingran
    Du, Shixuan
    Wang, Jing
    Xie, Xiang
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 650 - 654