Feature Pooling of Modulation Spectrum Features for Improved Speech Emotion Recognition in the Wild

被引:28
|
作者
Avila, Anderson R. [1 ]
Akhtar, Zahid [1 ]
Santos, Joao F. [1 ]
O'Shaughnessy, Douglas [1 ]
Falk, Tiago H. [1 ]
机构
[1] INRS EMT, Telecommun, Montreal, PQ, Canada
基金
欧盟地平线“2020”; 加拿大自然科学与工程研究理事会;
关键词
Affective computing; speech emotion recognition; modulation spectrum; in-the-wild; NEURAL-NETWORKS; FREQUENCY;
D O I
10.1109/TAFFC.2018.2858255
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Interest in affective computing is burgeoning, in great part due to its role in emerging affective human-computer interfaces (HCI). To date, the majority of existing research on automated emotion analysis has relied on data collected in controlled environments. With the rise of HCI applications on mobile devices, however, so-called "in-the-wild" settings have posed a serious threat for emotion recognition systems, particularly those based on voice. In this case, environmental factors such as ambient noise and reverberation severely hamper system performance. In this paper, we quantify the detrimental effects that the environment has on emotion recognition and explore the benefits achievable with speech enhancement. Moreover, we propose a modulation spectral feature pooling scheme that is shown to outperform a state-of-the-art benchmark system for environment-robust prediction of spontaneous arousal and valence emotional primitives. Experiments on an environment-corrupted version of the RECOLA dataset of spontaneous interactions show the proposed feature pooling scheme, combined with speech enhancement, outperforming the benchmark across different noise-only, reverberation-only and noise-plus-reverberation conditions. Additional tests with the SEWA database show the benefits of the proposed method for in-the-wild applications.
引用
收藏
页码:177 / 188
页数:12
相关论文
共 50 条
  • [1] Speech Emotion Recognition on Mobile Devices Based on Modulation Spectral Feature Pooling and Deep Neural Networks
    Avila, Anderson R.
    Monteiro, Joao
    O'Shaughneussy, Douglas
    Falk, Tiago H.
    2017 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2017, : 360 - 365
  • [2] Quality-Aware Bag of Modulation Spectrum Features for Robust Speech Emotion Recognition
    Kshirsagar, Shruti Rajendra
    Falk, Tiago Henrik
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (04) : 1892 - 1905
  • [3] Amplitude Modulation Features for Emotion Recognition from Speech
    Alam, Md Jahangir
    Attabi, Yazid
    Dumouchel, Pierre
    Kenny, Patrick
    O'Shaughnessy, D.
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2419 - 2423
  • [4] Automatic speech emotion recognition using modulation spectral features
    Wu, Siqing
    Falk, Tiago H.
    Chan, Wai-Yip
    SPEECH COMMUNICATION, 2011, 53 (05) : 768 - 785
  • [5] Modulation Spectrum Equalization for Improved Robust Speech Recognition
    Sun, Liang-Che
    Lee, Lin-Shan
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (03): : 828 - 843
  • [6] Combining acoustic features for improved emotion recognition in Mandarin speech
    Pao, TL
    Chen, YT
    Yeh, JH
    Liao, WY
    AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS, 2005, 3784 : 279 - 285
  • [7] Feature representation for speech emotion Recognition
    Abdollahpour, Mehdi
    Zamani, Lafar
    Rad, Hamidreza Saligheh
    2017 25TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2017, : 1465 - 1468
  • [8] Improved modulation spectrum enhancement methods for robust speech recognition
    Hung, Jeih-weih
    Tu, Wen-hsiang
    Lai, Chien-chou
    SIGNAL PROCESSING, 2012, 92 (11) : 2791 - 2814
  • [9] Improved modulation spectrum normalization techniques for robust speech recognition
    Pan, Chi-an
    Wang, Chieh-cheng
    Hung, Jeih-weih
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4089 - 4092
  • [10] Improved Frequency Modulation Features for Multichannel Distant Speech Recognition
    Rodomagoulakis, Isidoros
    Maragos, Petros
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 841 - 849