Feature Pooling of Modulation Spectrum Features for Improved Speech Emotion Recognition in the Wild

被引：28

作者：

Avila, Anderson R. ^{[1
]}

Akhtar, Zahid ^{[1
]}

Santos, Joao F. ^{[1
]}

O'Shaughnessy, Douglas ^{[1
]}

Falk, Tiago H. ^{[1
]}

机构：

[1] INRS EMT, Telecommun, Montreal, PQ, Canada

来源：

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING | 2021年 / 12卷 / 01期

基金：

欧盟地平线“2020”; 加拿大自然科学与工程研究理事会;

关键词：

Affective computing; speech emotion recognition; modulation spectrum; in-the-wild; NEURAL-NETWORKS; FREQUENCY;

D O I：

10.1109/TAFFC.2018.2858255

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Interest in affective computing is burgeoning, in great part due to its role in emerging affective human-computer interfaces (HCI). To date, the majority of existing research on automated emotion analysis has relied on data collected in controlled environments. With the rise of HCI applications on mobile devices, however, so-called "in-the-wild" settings have posed a serious threat for emotion recognition systems, particularly those based on voice. In this case, environmental factors such as ambient noise and reverberation severely hamper system performance. In this paper, we quantify the detrimental effects that the environment has on emotion recognition and explore the benefits achievable with speech enhancement. Moreover, we propose a modulation spectral feature pooling scheme that is shown to outperform a state-of-the-art benchmark system for environment-robust prediction of spontaneous arousal and valence emotional primitives. Experiments on an environment-corrupted version of the RECOLA dataset of spontaneous interactions show the proposed feature pooling scheme, combined with speech enhancement, outperforming the benchmark across different noise-only, reverberation-only and noise-plus-reverberation conditions. Additional tests with the SEWA database show the benefits of the proposed method for in-the-wild applications.

引用

页码：177 / 188

页数：12

共 50 条

[41] Novel acoustic features for speech emotion recognition
ROH Yong-Wan
KIM Dong-Ju
LEE Woo-Seok
HONG Kwang-Seok
Science in China(Series E:Technological Sciences), 2009, 52 (07) : 1838 - 1848
[42] Exploiting the potentialities of features for speech emotion recognition
Li, Dongdong
Zhou, Yijun
Wang, Zhe
Gao, Daqi
INFORMATION SCIENCES, 2021, 548 : 328 - 343
[43] Significance of Phonological Features in Speech Emotion Recognition
Wei Wang
Paul A. Watters
Xinyi Cao
Lingjie Shen
Bo Li
International Journal of Speech Technology, 2020, 23 : 633 - 642
[44] Learning Transferable Features for Speech Emotion Recognition
Marczewski, Alison
Veloso, Adriano
Ziviani, Nivio
PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, : 529 - 536
[45] Applying articulatory features to speech emotion recognition
Zhou, Yu
Sun, Yanqing
Yang, Lin
Yan, Yonghong
2009 INTERNATIONAL CONFERENCE ON RESEARCH CHALLENGES IN COMPUTER SCIENCE, ICRCCS 2009, 2009, : 73 - 76
[46] Novel acoustic features for speech emotion recognition
Yong-Wan Roh
Dong-Ju Kim
Woo-Seok Lee
Kwang-Seok Hong
Science in China Series E: Technological Sciences, 2009, 52 : 1838 - 1848
[47] Speech Emotion Recognition using Combination of Features
Zhang, Qingli
An, Ning
Wang, Kunxia
Ren, Fuji
Li, Lian
PROCEEDINGS OF THE 2013 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT CONTROL AND INFORMATION PROCESSING (ICICIP), 2013, : 523 - 528
[48] Speech emotion recognition: Features and classification models
Chen, Lijiang
Mao, Xia
Xue, Yuli
Cheng, Lee Lung
DIGITAL SIGNAL PROCESSING, 2012, 22 (06) : 1154 - 1160
[49] SPEECH EMOTION RECOGNITION WITH ACOUSTIC AND LEXICAL FEATURES
Jin, Qin
Li, Chengxin
Chen, Shizhe
Wu, Huimin
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4749 - 4753
[50] Statistical Evaluation of Speech Features for Emotion Recognition
Iliou, Theodoros
Anagnostopoulos, Christos-Nikolaos
ICDT: 2009 FOURTH INTERNATIONAL CONFERENCE ON DIGITAL TELECOMMUNICATIONS, 2009, : 121 - 126

← 1 2 3 4 5 →