Speech Emotion Recognition on Mobile Devices Based on Modulation Spectral Feature Pooling and Deep Neural Networks

被引:0
|
作者
Avila, Anderson R. [1 ]
Monteiro, Joao [1 ]
O'Shaughneussy, Douglas [1 ]
Falk, Tiago H. [1 ]
机构
[1] Univ Quebec, INRS EMT, Montreal, PQ, Canada
来源
2017 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT) | 2017年
基金
加拿大自然科学与工程研究理事会;
关键词
Affective computing; Speech emotion recognition; Modulation spectrum; In-the-wild; Mobile sensing;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this study, the problem of speech emotion recognition (SER) in-the-wild is addressed. A new modulation spectral feature pooling scheme is proposed to mitigate the detrimental effects of background noise. On top of these features, two DNN-based architectures are tested for the prediction of arousal and valence emotional primitives: a multi-layer perceptron (MLP) and a recurrent neural network based on Long-Short Term Memory (LSTM). Experiments are conducted using the RECOLA dataset of spontaneous interactions. In order to simulate data collected in-the-wild, the clean speech files were corrupted with different levels of background noise and room impulse responses collected using a mobile device. Both stationary and non-stationary noise types (fan and babble) were considered in our experiments. Three distinct scenarios were explored: noise only, reverberation only and noise-plus-reverberation. Experimental results have shown that, in most of the scenarios, the proposed SER system achieved better performance in terms of concordance correlation coefficients (CCC) compared to the benchmark algorithm described in the 2016 Audio/Visual Emotion Challenge. The proposed feature system also showed to be more robust when noise-plus-reverberation is considered.
引用
收藏
页码:360 / 365
页数:6
相关论文
共 50 条
  • [21] Convolution neural network with multiple pooling strategies for speech emotion recognition
    Jiang, Pengxu
    Zou, Cairong
    2022 6TH INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND INTELLIGENT CONTROL, ISCSIC, 2022, : 89 - 92
  • [22] Emotion recognition in speech using neural networks
    Nicholson, J
    Takahashi, K
    Nakatsu, R
    AFFECTIVE MINDS, 2000, : 215 - 220
  • [23] Emotion recognition in speech using neural networks
    Nicholson, J
    Takahashi, K
    Nakatsu, R
    NEURAL COMPUTING & APPLICATIONS, 2000, 9 (04): : 290 - 296
  • [24] Emotion Recognition in Speech Using Neural Networks
    J. Nicholson
    K. Takahashi
    R. Nakatsu
    Neural Computing & Applications, 2000, 9 : 290 - 296
  • [25] END-TO-END SPEECH EMOTION RECOGNITION USING DEEP NEURAL NETWORKS
    Tzirakis, Panagiotis
    Zhang, Jiehao
    Schuller, Bjoern W.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5089 - 5093
  • [26] Robust feature extraction for mobile-based speech emotion recognition system
    Lee, Kang-Kue
    Cho, Youn-Ho
    Park, Kyu-Sik
    INTELLIGENT COMPUTING IN SIGNAL PROCESSING AND PATTERN RECOGNITION, 2006, 345 : 470 - 477
  • [27] Modulation-based Speech Emotion Recognition with Reconstruction Error Feature Expansion
    Mihalache, Serban
    Burileanu, Dragos
    Pop, Gheorghe
    Burileanu, Corneliu
    2019 10TH INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2019,
  • [28] SPATIOTEMPORAL ATTENTION BASED DEEP NEURAL NETWORKS FOR EMOTION RECOGNITION
    Lee, Jiyoung
    Kim, Sunok
    Kim, Seungryong
    Sohn, Kwanghoon
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 1513 - 1517
  • [29] Human Voice Emotion Identification Using Prosodic and Spectral Feature Extraction Based on Deep Neural Networks
    Gumelar, Agustinus Bimo
    Kurniawan, Afid
    Sooai, Adri Gabriel
    Purnomo, Mauridhi Hery
    Yuniarno, Eko Mulyanto
    Sugiarto, Indar
    Widodo, Agung
    Kristanto, Andreas Agung
    Fahrudin, Tresna Maulana
    2019 IEEE 7TH INTERNATIONAL CONFERENCE ON SERIOUS GAMES AND APPLICATIONS FOR HEALTH (SEGAH), 2019,
  • [30] An Investigation of Spectral Restoration Algorithms for Deep Neural Networks based Noise Robust Speech Recognition
    Li, Bo
    Tsao, Yu
    Sim, Khe Chai
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3001 - +