Speech Emotion Recognition on Mobile Devices Based on Modulation Spectral Feature Pooling and Deep Neural Networks

被引:0
|
作者
Avila, Anderson R. [1 ]
Monteiro, Joao [1 ]
O'Shaughneussy, Douglas [1 ]
Falk, Tiago H. [1 ]
机构
[1] Univ Quebec, INRS EMT, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Affective computing; Speech emotion recognition; Modulation spectrum; In-the-wild; Mobile sensing;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this study, the problem of speech emotion recognition (SER) in-the-wild is addressed. A new modulation spectral feature pooling scheme is proposed to mitigate the detrimental effects of background noise. On top of these features, two DNN-based architectures are tested for the prediction of arousal and valence emotional primitives: a multi-layer perceptron (MLP) and a recurrent neural network based on Long-Short Term Memory (LSTM). Experiments are conducted using the RECOLA dataset of spontaneous interactions. In order to simulate data collected in-the-wild, the clean speech files were corrupted with different levels of background noise and room impulse responses collected using a mobile device. Both stationary and non-stationary noise types (fan and babble) were considered in our experiments. Three distinct scenarios were explored: noise only, reverberation only and noise-plus-reverberation. Experimental results have shown that, in most of the scenarios, the proposed SER system achieved better performance in terms of concordance correlation coefficients (CCC) compared to the benchmark algorithm described in the 2016 Audio/Visual Emotion Challenge. The proposed feature system also showed to be more robust when noise-plus-reverberation is considered.
引用
收藏
页码:360 / 365
页数:6
相关论文
共 50 条
  • [1] Modulation spectral features for speech emotion recognition using deep neural networks
    Singh, Premjeet
    Sahidullah, Md
    Saha, Goutam
    SPEECH COMMUNICATION, 2023, 146 : 53 - 69
  • [2] Deep Convolutional Neural Networks for Feature Extraction in Speech Emotion Recognition
    Heracleous, Panikos
    Mohammad, Yasser
    Yoneyama, Akio
    HUMAN-COMPUTER INTERACTION. RECOGNITION AND INTERACTION TECHNOLOGIES, HCI 2019, PT II, 2019, 11567 : 117 - 132
  • [3] Feature Pooling of Modulation Spectrum Features for Improved Speech Emotion Recognition in the Wild
    Avila, Anderson R.
    Akhtar, Zahid
    Santos, Joao F.
    O'Shaughnessy, Douglas
    Falk, Tiago H.
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2021, 12 (01) : 177 - 188
  • [4] Improvement on Speech Emotion Recognition Based on Deep Convolutional Neural Networks
    Niu, Yafeng
    Zou, Dongsheng
    Niu, Yadong
    He, Zhongshi
    Tan, Hua
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON COMPUTING AND ARTIFICIAL INTELLIGENCE (ICCAI 2018), 2018, : 13 - 18
  • [5] Speech emotion recognition with deep convolutional neural networks
    Issa, Dias
    Demirci, M. Fatih
    Yazici, Adnan
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
  • [6] Dimensional Emotion Recognition from Speech Using Modulation Spectral Features and Recurrent Neural Networks
    Peng, Zhichao
    Zhu, Zhi
    Unoki, Masashi
    Dang, Jianwu
    Akagi, Masato
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 524 - 528
  • [7] Deep Neural Network Based Spectral Feature Mapping for Robust Speech Recognition
    Han, Kun
    He, Yanzhang
    Bagchi, Deblin
    Fosler-Lussier, Eric
    Wang, DeLiang
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2484 - 2488
  • [8] An Experimental Study of Speech Emotion Recognition Based on Deep Convolutional Neural Networks
    Zheng, W. Q.
    Yu, J. S.
    Zou, Y. X.
    2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 827 - 831
  • [9] Speech Emotion Recognition based on Gaussian Mixture Models and Deep Neural Networks
    Tashev, Ivan J.
    Wang, Zhong-Qiu
    Godin, Keith
    2017 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA), 2017,
  • [10] FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition
    Dossou, Bonaventure F. P.
    Gbenou, Yeno K. S.
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3526 - 3531