Acoustic Features and Neural Representations for Categorical Emotion Recognition from Speech

被引：17

作者：

Keesing, Aaron ^{[1
]}

Koh, Yun Sing ^{[1
]}

Witbrock, Michael ^{[1
]}

机构：

[1] Univ Auckland, Sch Comp Sci, Auckland, New Zealand

来源：

INTERSPEECH 2021 | 2021年

关键词：

speech emotion recognition; computational paralinguistics; affective computing; CORPUS;

D O I：

10.21437/Interspeech.2021-2217

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Many features have been proposed for use in speech emotion recognition, from signal processing features to bag-of-audio-words (BoAW) models to abstract neural representations. Some of these feature types have not been directly compared across a large number of speech corpora to determine performance differences. We propose a full factorial design and to compare speech processing features, BoAW and neural representations on 17 emotional speech datasets. We measure the performance of features in a categorical emotion classification problem for each dataset, using speaker-independent cross-validation with diverse classifiers. Results show statistically significant differences between features and between classifiers, with large effect sizes between features. In particular, standard acoustic feature sets still perform competitively to neural representations, while neural representations have a larger range of performance, and BoAW features lie in the middle. The best and worst neural representations were wav2veq and VGGish, respectively, with wav2vec performing best out of all tested features. These results indicate that standard acoustic feature sets are still very useful baselines for emotional classification, but high quality neural speech representations can be better.

引用

页码：3415 / 3419

页数：5

共 50 条

[41] Children's Emotion Recognition from Spontaneous Speech Using a Reduced Set of Acoustic and Linguistic Features
Planet, Santiago
Iriondo, Ignasi
[J]. COGNITIVE COMPUTATION, 2013, 5 (04) : 526 - 532
[42] Emotion Recognition from Speech using Prosodic and Linguistic Features
Pervaiz, Mahwish
Khan, Tamim Ahmed
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (08) : 84 - 90
[43] From Simulated Speech to Natural Speech, What are the Robust Features for Emotion Recognition?
Li, Ya
Chao, Linlin
Liu, Yazhu
Bao, Wei
Tao, Jianhua
[J]. 2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 368 - 373
[44] Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition
Jiang, Pengxu
Fu, Hongliang
Tao, Huawei
Lei, Peizhi
Zhao, Li
[J]. IEEE ACCESS, 2019, 7 : 90368 - 90377
[45] Speech Emotion Recognition of Merged Features Based on Improved Convolutional Neural Network
Peng, Wangyue
Tang, Xiaoyu
[J]. 2019 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP), 2019, : 301 - 305
[46] Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks
Mao, Qirong
Dong, Ming
Huang, Zhengwei
Zhan, Yongzhao
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (08) : 2203 - 2213
[47] Modulation spectral features for speech emotion recognition using deep neural networks
Singh, Premjeet
Sahidullah, Md
Saha, Goutam
[J]. SPEECH COMMUNICATION, 2023, 146 : 53 - 69
[48] Acoustic-Prosodic Recognition of Emotion in Speech
Montenegro, Chuchi S.
Maravillas, Elmer A.
[J]. 2015 INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY,COMMUNICATION AND CONTROL, ENVIRONMENT AND MANAGEMENT (HNICEM), 2015, : 527 - +
[49] Speech Emotion Classification using Acoustic Features
Chen, Shizhe
Jin, Qin
Li, Xirong
Yang, Gang
Xu, Jieping
[J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 579 - 583
[50] An optimal two stage feature selection for speech emotion recognition using acoustic features
Kuchibhotla S.
Vankayalapati H.D.
Anne K.R.
[J]. International Journal of Speech Technology, 2016, 19 (4) : 657 - 667

← 1 2 3 4 5 →