Prosodic, spectral and voice quality feature selection using a long-term stopping criterion for audio-based emotion recognition

被引:14
|
作者
Kaechele, Markus [1 ]
Zharkov, Dimitrij [1 ]
Meudt, Sascha [1 ]
Schwenker, Friedhelm [1 ]
机构
[1] Univ Ulm, Inst Neural Informat Proc, D-89069 Ulm, Germany
关键词
CLASSIFIER SYSTEMS;
D O I
10.1109/ICPR.2014.148
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion recognition from speech is an important field of research in human-machine-interfaces, and has begun to influence everyday life by employment in different areas such as call centers or wearable companions in the form of smartphones. In the proposed classification architecture, different spectral, prosodic and the relatively novel voice quality features are extracted from the speech signals. These features are then used to represent long-term information of the speech, leading to utterance-wise suprasegmental features. The most promising of these features are selected using a forward-selection/backward-elimination algorithm with a novel long-term termination criterion for the selection. The overall system has been evaluated using recordings from the public Berlin emotion database. Utilizing the resulted features, a recognition rate of 88,97% has been achieved which surpasses the performance of humans on this database and is comparable to the state of the art performance on this dataset.
引用
收藏
页码:803 / 808
页数:6
相关论文
共 50 条
  • [21] Feature Pyramid Networks and Long Short-Term Memory for EEG Feature Map-Based Emotion Recognition
    Zhang, Xiaodan
    Li, Yige
    Du, Jinxiang
    Zhao, Rui
    Xu, Kemeng
    Zhang, Lu
    She, Yichong
    SENSORS, 2023, 23 (03)
  • [22] AUTOMATIC RECOGNITION OF SPEECH EMOTION USING LONG-TERM SPECTRO-TEMPORAL FEATURES
    Wu, Siqing
    Falk, Tiago H.
    Chan, Wai-Yip
    2009 16TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, VOLS 1 AND 2, 2009, : 205 - 210
  • [23] Learning long-term temporal contexts using skip RNN for continuous emotion recognition
    Jian HUANG
    Bin LIU
    Jianhua TAO
    虚拟现实与智能硬件(中英文), 2021, 3 (01) : 55 - 64
  • [24] Enhancing EEG-Based Emotion Recognition Using MultiDomain Features and Genetic Algorithm Based Feature Selection
    Marjit, Shyam
    Talukdar, Upasana
    Hazarika, Shyamanta M.
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2021, 2024, 13102 : 345 - 353
  • [25] Erratum to: Efficient voice activity detection algorithm using long-term spectral flatness measure
    Yanna Ma
    Akinori Nishihara
    EURASIP Journal on Audio, Speech, and Music Processing, 2015
  • [26] Feature selection based on a fuzzy complementary criterion: application to gait recognition using ground reaction forces
    Moustakidis, S. P.
    Theocharis, J. B.
    Giakas, G.
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING, 2012, 15 (06) : 627 - 644
  • [27] SPEAKER CLUSTERING USING VECTOR REPRESENTATION WITH LONG-TERM FEATURE FOR LECTURE SPEECH RECOGNITION
    Huang, Chien-Lin
    Hori, Chiori
    Kashioka, Hideki
    Ma, Bin
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 3532 - 3536
  • [28] Power Quality Disturbance Recognition Using VMD-Based Feature Extraction and Heuristic Feature Selection
    Fu, Lei
    Zhu, Tiantian
    Pan, Guobing
    Chen, Sihan
    Zhong, Qi
    Wei, Yanding
    APPLIED SCIENCES-BASEL, 2019, 9 (22):
  • [29] Feature selection using tabu search with long-term memories and probabilistic neural networks
    Wang, Yong
    Li, Lin
    Ni, Jun
    Huang, Shuhong
    PATTERN RECOGNITION LETTERS, 2009, 30 (07) : 661 - 670
  • [30] Audio-visual emotion recognition using FCBF feature selection method and particle swarm optimization for fuzzy ARTMAP neural networks
    Davood Gharavian
    Mehdi Bejani
    Mansour Sheikhan
    Multimedia Tools and Applications, 2017, 76 : 2331 - 2352