Feature Pooling of Modulation Spectrum Features for Improved Speech Emotion Recognition in the Wild

被引:28
|
作者
Avila, Anderson R. [1 ]
Akhtar, Zahid [1 ]
Santos, Joao F. [1 ]
O'Shaughnessy, Douglas [1 ]
Falk, Tiago H. [1 ]
机构
[1] INRS EMT, Telecommun, Montreal, PQ, Canada
基金
欧盟地平线“2020”; 加拿大自然科学与工程研究理事会;
关键词
Affective computing; speech emotion recognition; modulation spectrum; in-the-wild; NEURAL-NETWORKS; FREQUENCY;
D O I
10.1109/TAFFC.2018.2858255
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Interest in affective computing is burgeoning, in great part due to its role in emerging affective human-computer interfaces (HCI). To date, the majority of existing research on automated emotion analysis has relied on data collected in controlled environments. With the rise of HCI applications on mobile devices, however, so-called "in-the-wild" settings have posed a serious threat for emotion recognition systems, particularly those based on voice. In this case, environmental factors such as ambient noise and reverberation severely hamper system performance. In this paper, we quantify the detrimental effects that the environment has on emotion recognition and explore the benefits achievable with speech enhancement. Moreover, we propose a modulation spectral feature pooling scheme that is shown to outperform a state-of-the-art benchmark system for environment-robust prediction of spontaneous arousal and valence emotional primitives. Experiments on an environment-corrupted version of the RECOLA dataset of spontaneous interactions show the proposed feature pooling scheme, combined with speech enhancement, outperforming the benchmark across different noise-only, reverberation-only and noise-plus-reverberation conditions. Additional tests with the SEWA database show the benefits of the proposed method for in-the-wild applications.
引用
收藏
页码:177 / 188
页数:12
相关论文
共 50 条
  • [31] An optimal two stage feature selection for speech emotion recognition using acoustic features
    Kuchibhotla S.
    Vankayalapati H.D.
    Anne K.R.
    International Journal of Speech Technology, 2016, 19 (4) : 657 - 667
  • [32] On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition
    Kacur, Juraj
    Puterka, Boris
    Pavlovicova, Jarmila
    Oravec, Milos
    SENSORS, 2021, 21 (05) : 1 - 27
  • [33] Speech Emotion Recognition Using Speech Feature and Word Embedding
    Atmaja, Bagus Tris
    Shirai, Kiyoaki
    Akagi, Masato
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 519 - 523
  • [34] SPEECH EMOTION RECOGNITION METHOD BASED ON IMPROVED DECISION TREE AND LAYERED FEATURE SELECTION
    Mao, Qirong
    Wang, Xiaojia
    Zhan, Yongzhao
    INTERNATIONAL JOURNAL OF HUMANOID ROBOTICS, 2010, 7 (02) : 245 - 261
  • [35] EMOTION CLASSIFICATION OF SPEECH USING MODULATION FEATURES
    Chaspari, Theodora
    Dimitriadis, Dimitrios
    Maragos, Petros
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 1552 - 1556
  • [36] The modulation spectrum in the automatic recognition of speech
    Hermansky, H
    1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 140 - 147
  • [37] Convolution neural network with multiple pooling strategies for speech emotion recognition
    Jiang, Pengxu
    Zou, Cairong
    2022 6TH INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND INTELLIGENT CONTROL, ISCSIC, 2022, : 89 - 92
  • [38] An Attention Pooling based Representation Learning Method for Speech Emotion Recognition
    Li, Pengcheng
    Song, Yan
    McLoughlin, Ian
    Guo, Wu
    Dai, Lirong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3087 - 3091
  • [39] Stable Speech Emotion Recognition with Head-k-Pooling Loss
    Ding, Chaoyue
    Li, Jiakui
    Zong, Daoming
    Li, Baoxiang
    Zhang, Tianhao
    Zhou, Qunyan
    INTERSPEECH 2023, 2023, : 661 - 665
  • [40] Integrating Language and Emotion Features for Multilingual Speech Emotion Recognition
    Heracleous, Panikos
    Mohammad, Yasser
    Yoneyama, Akio
    HUMAN-COMPUTER INTERACTION. MULTIMODAL AND NATURAL INTERACTION, HCI 2020, PT II, 2020, 12182 : 187 - 196