Acoustic Features and Neural Representations for Categorical Emotion Recognition from Speech

被引:17
|
作者
Keesing, Aaron [1 ]
Koh, Yun Sing [1 ]
Witbrock, Michael [1 ]
机构
[1] Univ Auckland, Sch Comp Sci, Auckland, New Zealand
来源
关键词
speech emotion recognition; computational paralinguistics; affective computing; CORPUS;
D O I
10.21437/Interspeech.2021-2217
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Many features have been proposed for use in speech emotion recognition, from signal processing features to bag-of-audio-words (BoAW) models to abstract neural representations. Some of these feature types have not been directly compared across a large number of speech corpora to determine performance differences. We propose a full factorial design and to compare speech processing features, BoAW and neural representations on 17 emotional speech datasets. We measure the performance of features in a categorical emotion classification problem for each dataset, using speaker-independent cross-validation with diverse classifiers. Results show statistically significant differences between features and between classifiers, with large effect sizes between features. In particular, standard acoustic feature sets still perform competitively to neural representations, while neural representations have a larger range of performance, and BoAW features lie in the middle. The best and worst neural representations were wav2veq and VGGish, respectively, with wav2vec performing best out of all tested features. These results indicate that standard acoustic feature sets are still very useful baselines for emotional classification, but high quality neural speech representations can be better.
引用
收藏
页码:3415 / 3419
页数:5
相关论文
共 50 条
  • [41] Children's Emotion Recognition from Spontaneous Speech Using a Reduced Set of Acoustic and Linguistic Features
    Planet, Santiago
    Iriondo, Ignasi
    [J]. COGNITIVE COMPUTATION, 2013, 5 (04) : 526 - 532
  • [42] Emotion Recognition from Speech using Prosodic and Linguistic Features
    Pervaiz, Mahwish
    Khan, Tamim Ahmed
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (08) : 84 - 90
  • [43] From Simulated Speech to Natural Speech, What are the Robust Features for Emotion Recognition?
    Li, Ya
    Chao, Linlin
    Liu, Yazhu
    Bao, Wei
    Tao, Jianhua
    [J]. 2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 368 - 373
  • [44] Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition
    Jiang, Pengxu
    Fu, Hongliang
    Tao, Huawei
    Lei, Peizhi
    Zhao, Li
    [J]. IEEE ACCESS, 2019, 7 : 90368 - 90377
  • [45] Speech Emotion Recognition of Merged Features Based on Improved Convolutional Neural Network
    Peng, Wangyue
    Tang, Xiaoyu
    [J]. 2019 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP), 2019, : 301 - 305
  • [46] Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks
    Mao, Qirong
    Dong, Ming
    Huang, Zhengwei
    Zhan, Yongzhao
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (08) : 2203 - 2213
  • [47] Modulation spectral features for speech emotion recognition using deep neural networks
    Singh, Premjeet
    Sahidullah, Md
    Saha, Goutam
    [J]. SPEECH COMMUNICATION, 2023, 146 : 53 - 69
  • [48] Acoustic-Prosodic Recognition of Emotion in Speech
    Montenegro, Chuchi S.
    Maravillas, Elmer A.
    [J]. 2015 INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY,COMMUNICATION AND CONTROL, ENVIRONMENT AND MANAGEMENT (HNICEM), 2015, : 527 - +
  • [49] Speech Emotion Classification using Acoustic Features
    Chen, Shizhe
    Jin, Qin
    Li, Xirong
    Yang, Gang
    Xu, Jieping
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 579 - 583
  • [50] An optimal two stage feature selection for speech emotion recognition using acoustic features
    Kuchibhotla S.
    Vankayalapati H.D.
    Anne K.R.
    [J]. International Journal of Speech Technology, 2016, 19 (4) : 657 - 667