Speech Emotion Recognition on Mobile Devices Based on Modulation Spectral Feature Pooling and Deep Neural Networks

被引:0
|
作者
Avila, Anderson R. [1 ]
Monteiro, Joao [1 ]
O'Shaughneussy, Douglas [1 ]
Falk, Tiago H. [1 ]
机构
[1] Univ Quebec, INRS EMT, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Affective computing; Speech emotion recognition; Modulation spectrum; In-the-wild; Mobile sensing;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this study, the problem of speech emotion recognition (SER) in-the-wild is addressed. A new modulation spectral feature pooling scheme is proposed to mitigate the detrimental effects of background noise. On top of these features, two DNN-based architectures are tested for the prediction of arousal and valence emotional primitives: a multi-layer perceptron (MLP) and a recurrent neural network based on Long-Short Term Memory (LSTM). Experiments are conducted using the RECOLA dataset of spontaneous interactions. In order to simulate data collected in-the-wild, the clean speech files were corrupted with different levels of background noise and room impulse responses collected using a mobile device. Both stationary and non-stationary noise types (fan and babble) were considered in our experiments. Three distinct scenarios were explored: noise only, reverberation only and noise-plus-reverberation. Experimental results have shown that, in most of the scenarios, the proposed SER system achieved better performance in terms of concordance correlation coefficients (CCC) compared to the benchmark algorithm described in the 2016 Audio/Visual Emotion Challenge. The proposed feature system also showed to be more robust when noise-plus-reverberation is considered.
引用
收藏
页码:360 / 365
页数:6
相关论文
共 50 条
  • [41] Stratified pooling based deep convolutional neural networks for human action recognition
    Sheng Yu
    Yun Cheng
    Songzhi Su
    Guorong Cai
    Shaozi Li
    Multimedia Tools and Applications, 2017, 76 : 13367 - 13382
  • [42] Stratified pooling based deep convolutional neural networks for human action recognition
    Yu, Sheng
    Cheng, Yun
    Su, Songzhi
    Cai, Guorong
    Li, Shaozi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (11) : 13367 - 13382
  • [43] Acceleration Strategies for Speech Recognition based on Deep Neural Networks
    Tian, Chao
    Liu, Jia
    Peng, Zhaomeng
    MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 5181 - 5185
  • [44] Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network
    Farooq, Misbah
    Hussain, Fawad
    Baloch, Naveed Khan
    Raja, Fawad Riasat
    Yu, Heejung
    Zikria, Yousaf Bin
    SENSORS, 2020, 20 (21) : 1 - 18
  • [45] Emotion recognition from speech using deep recurrent neural networks with acoustic features
    Byun, Sung-Woo
    Shin, Bo-Ra
    Lee, Seok-Pil
    Han, Hyuk-Soo
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2018, 123 : 43 - 44
  • [46] Towards Real-time Speech Emotion Recognition using Deep Neural Networks
    Fayek, H. M.
    Lech, M.
    Cavedon, L.
    2015 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2015,
  • [47] Speech Based Emotion Recognition Using Spectral Feature Extraction and an Ensemble of kNN Classifiers
    Rieger, Steven A., Jr.
    Muraleedharan, Rajani
    Ramachandran, Ravi P.
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 589 - +
  • [48] On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks
    Lakomkin, Egor
    Zamani, Mohammad Ali
    Weber, Cornelius
    Magg, Sven
    Wermter, Stefan
    2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 854 - 860
  • [49] Speech Emotion Recognition and Deep Learning: An Extensive Validation Using Convolutional Neural Networks
    Ri, Francesco Ardan Dal
    Ciardi, Fabio Cifariello
    Conci, Nicola
    IEEE ACCESS, 2023, 11 : 116638 - 116649
  • [50] Speech emotion recognition using spiking neural networks
    Buscicchio, Cosimo A.
    Gorecki, Przemyslaw
    Caponetti, Laura
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2006, 4203 : 38 - 46