Automatic speech emotion recognition using modulation spectral features

被引:240
|
作者
Wu, Siqing [1 ]
Falk, Tiago H. [2 ]
Chan, Wai-Yip [1 ]
机构
[1] Queens Univ, Dept Elect & Comp Engn, Kingston, ON K7L 3N6, Canada
[2] Univ Toronto, Inst Biomat & Biomed Engn, Toronto, ON M5S 3G9, Canada
关键词
Emotion recognition; Speech modulation; Spectro-temporal representation; Affective computing; Speech analysis; CLASSIFICATION; FREQUENCY; ENVELOPE;
D O I
10.1016/j.specom.2010.08.013
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this study, modulation spectral features (MSFs) are proposed for the automatic recognition of human affective information from speech. The features are extracted from an auditory-inspired long-term spectro-temporal representation. Obtained using an auditory filterbank and a modulation filterbank for speech analysis, the representation captures both acoustic frequency and temporal modulation frequency components, thereby conveying information that is important for human speech perception but missing from conventional short-term spectral features. On an experiment assessing classification of discrete emotion categories, the MSFs show promising performance in comparison with features that are based on mel-frequency cepstral coefficients and perceptual linear prediction coefficients, two commonly used short-term spectral representations. The MSFs further render a substantial improvement in recognition performance when used to augment prosodic features, which have been extensively used for emotion recognition. Using both types of features, an overall recognition rate of 91.6% is obtained for classifying seven emotion categories. Moreover, in an experiment assessing recognition of continuous emotions, the proposed features in combination with prosodic features attain estimation performance comparable to human evaluation. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:768 / 785
页数:18
相关论文
共 50 条
  • [31] Speech emotion recognition using nonlinear dynamics features
    Shahzadi, Ali
    Ahmadyfard, Alireza
    Harimi, Ali
    Yaghmaie, Khashayar
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2015, 23 : 2056 - 2073
  • [32] Speech Emotion Recognition Using Minimum Extracted Features
    Abdulsalam, Wisal Hashim
    Alhamdani, Rafah Shihab
    Abdullah, Mohammed Najm
    [J]. 2018 1ST ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION AND SCIENCES (AICIS 2018), 2018, : 58 - 61
  • [33] Speech Emotion Recognition Using ANN on MFCC Features
    Dolka, Harshit
    Xavier, Arul V. M.
    Juliet, Sujitha
    [J]. ICSPC'21: 2021 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION (ICPSC), 2021, : 431 - 435
  • [34] RECOGNITION OF EMOTION IN SPEECH USING VARIOGRAM BASED FEATURES
    Esmaileyan, Zeynab
    Marvi, Hosein
    [J]. MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2014, 27 (03) : 156 - 170
  • [35] Speech Emotion Recognition Using Magnitude and Phase Features
    D. Ravi Shankar
    R. B. Manjula
    Rajashekhar C. Biradar
    [J]. SN Computer Science, 5 (5)
  • [36] Speech Emotion Recognition Using Local and Global Features
    Gao, Yuanbo
    Li, Baobin
    Wang, Ning
    Zhu, Tingshao
    [J]. BRAIN INFORMATICS, BI 2017, 2017, 10654 : 3 - 13
  • [37] Emotion recognition using novel speech signal features
    Tabatabaei, Talieh Seyed
    Krishnan, Sridhar
    Guergachi, Aziz
    [J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, 2007, : 345 - +
  • [38] AUTOMATIC RECOGNITION OF SPEECH EMOTION USING LONG-TERM SPECTRO-TEMPORAL FEATURES
    Wu, Siqing
    Falk, Tiago H.
    Chan, Wai-Yip
    [J]. 2009 16TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, VOLS 1 AND 2, 2009, : 205 - 210
  • [39] Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO
    Kerkeni, Leila
    Serrestou, Youssef
    Raoof, Kosai
    Mbarki, Mohamed
    Mahjoub, Mohamed Ali
    Cleder, Catherine
    [J]. SPEECH COMMUNICATION, 2019, 114 : 22 - 35
  • [40] Speech emotion recognition using multi resolution Hilbert transform based spectral and entropy features
    Mishra, Siba Prasad
    Warule, Pankaj
    Deb, Suman
    [J]. Applied Acoustics, 2025, 229