Automatic speech emotion recognition using modulation spectral features

被引:240
|
作者
Wu, Siqing [1 ]
Falk, Tiago H. [2 ]
Chan, Wai-Yip [1 ]
机构
[1] Queens Univ, Dept Elect & Comp Engn, Kingston, ON K7L 3N6, Canada
[2] Univ Toronto, Inst Biomat & Biomed Engn, Toronto, ON M5S 3G9, Canada
关键词
Emotion recognition; Speech modulation; Spectro-temporal representation; Affective computing; Speech analysis; CLASSIFICATION; FREQUENCY; ENVELOPE;
D O I
10.1016/j.specom.2010.08.013
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this study, modulation spectral features (MSFs) are proposed for the automatic recognition of human affective information from speech. The features are extracted from an auditory-inspired long-term spectro-temporal representation. Obtained using an auditory filterbank and a modulation filterbank for speech analysis, the representation captures both acoustic frequency and temporal modulation frequency components, thereby conveying information that is important for human speech perception but missing from conventional short-term spectral features. On an experiment assessing classification of discrete emotion categories, the MSFs show promising performance in comparison with features that are based on mel-frequency cepstral coefficients and perceptual linear prediction coefficients, two commonly used short-term spectral representations. The MSFs further render a substantial improvement in recognition performance when used to augment prosodic features, which have been extensively used for emotion recognition. Using both types of features, an overall recognition rate of 91.6% is obtained for classifying seven emotion categories. Moreover, in an experiment assessing recognition of continuous emotions, the proposed features in combination with prosodic features attain estimation performance comparable to human evaluation. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:768 / 785
页数:18
相关论文
共 50 条
  • [1] Modulation spectral features for speech emotion recognition using deep neural networks
    Singh, Premjeet
    Sahidullah, Md
    Saha, Goutam
    [J]. SPEECH COMMUNICATION, 2023, 146 : 53 - 69
  • [2] Automatic speech based emotion recognition using paralinguistics features
    Hook, J.
    Noroozi, F.
    Toygar, O.
    Anbarjafari, G.
    [J]. BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES, 2019, 67 (03) : 479 - 488
  • [3] Dimensional Emotion Recognition from Speech Using Modulation Spectral Features and Recurrent Neural Networks
    Peng, Zhichao
    Zhu, Zhi
    Unoki, Masashi
    Dang, Jianwu
    Akagi, Masato
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 524 - 528
  • [4] Hybrid Spectral Features for Speech Emotion Recognition
    Shah, Firoz A.
    Anto, Babu P.
    [J]. 2017 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2017,
  • [5] Improving Speech Emotion Recognition System Using Spectral and Prosodic Features
    Chakhtouna, Adil
    Sekkate, Sara
    Adib, Abdellah
    [J]. INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, ISDA 2021, 2022, 418 : 399 - 409
  • [6] On the Correlation and Transferability of Features between Automatic Speech Recognition and Speech Emotion Recognition
    Fayek, Haytham M.
    Lech, Margaret
    Cavedon, Lawrence
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3618 - 3622
  • [7] Urdu Speech Emotion Recognition using Speech Spectral Features and Deep Learning Techniques
    Taj, Soonh
    Shaikh, Ghulam Mujtaba
    Hassan, Saif
    Nimra
    [J]. 2023 4th International Conference on Computing, Mathematics and Engineering Technologies: Sustainable Technologies for Socio-Economic Development, iCoMET 2023, 2023,
  • [8] Amplitude Modulation Features for Emotion Recognition from Speech
    Alam, Md Jahangir
    Attabi, Yazid
    Dumouchel, Pierre
    Kenny, Patrick
    O'Shaughnessy, D.
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2419 - 2423
  • [10] Emotion Recognition Using Prosodic and Spectral Features of Speech and Naive Bayes Classifier
    Khan, Atreyee
    Roy, Uttam Kumar
    [J]. 2017 2ND IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2017, : 1017 - 1021