Analyzing the influence of different speech data corpora and speech features on speech emotion recognition: A review

被引:0
|
作者
Rathi, Tarun [1 ]
Tripathy, Manoj [1 ]
机构
[1] Indian Inst Technol, Dept Elect Engn, Roorkee 247667, India
关键词
Speech emotion recognition; Speech emotional data corpus; Speech features; Mel-frequency cepstral coefficients; Deep neural network; Convolutional neural network; DEEP; MODEL; NETWORK; DATABASES; RECURRENT; CNN; REPRESENTATIONS; CLASSIFIERS; 1D;
D O I
10.1016/j.specom.2024.103102
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Emotion recognition from speech has become crucial in human-computer interaction and affective computing applications. This review paper examines the complex relationship between two critical factors: the selection of speech data corpora and the extraction of speech features regarding speech emotion classification accuracy. Through an extensive analysis of literature from 2014 to 2023, publicly available speech datasets are explored and categorized based on their diversity, scale, linguistic attributes, and emotional classifications. The importance of various speech features, from basic spectral features to sophisticated prosodic cues, and their influence on emotion recognition accuracy is analyzed.. In the context of speech data corpora, this review paper unveils trends and insights from comparative studies exploring the repercussions of dataset choice on recognition efficacy. Various datasets such as IEMOCAP, EMODB, and MSP-IMPROV are scrutinized in terms of their influence on classifying the accuracy of the speech emotion recognition (SER) system. At the same time, potential challenges associated with dataset limitations are also examined. Notable features like Mel-frequency cepstral coefficients, pitch, intensity, and prosodic patterns are evaluated for their contributions to emotion recognition. Advanced feature extraction methods, too, are explored for their potential to capture intricate emotional dynamics. Moreover, this review paper offers insights into the methodological aspects of emotion recognition, shedding light on the diverse machine learning and deep learning approaches employed. Through a holistic synthesis of research findings, this review paper observes connections between the choice of speech data corpus, selection of speech features, and resulting emotion recognition accuracy. As the field continues to evolve, avenues for future research are proposed, ranging from enhanced feature extraction techniques to the development of standardized benchmark datasets. In essence, this review serves as a compass guiding researchers and practitioners through the intricate landscape of speech emotion recognition, offering a nuanced understanding of the factors shaping its recognition accuracy of speech emotion.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Emotion recognition of mandarin speech for different speech corpora based on nonlinear features
    Gao, Hui
    Chen, Shanguang
    An, Ping
    Su, Guangchuan
    [J]. PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 567 - +
  • [2] Databases, features and classifiers for speech emotion recognition: a review
    Swain, Monorama
    Routray, Aurobinda
    Kabisatpathy, P.
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2018, 21 (01) : 93 - 120
  • [3] A REVIEW ON SPEECH EMOTION FEATURES
    Zaidan, Noor Aina
    Salam, Md Sah Hj.
    [J]. JURNAL TEKNOLOGI, 2015, 75 (02): : 89 - 95
  • [4] On the Correlation and Transferability of Features between Automatic Speech Recognition and Speech Emotion Recognition
    Fayek, Haytham M.
    Lech, Margaret
    Cavedon, Lawrence
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3618 - 3622
  • [5] Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference
    Kadin, Sudarsana Reddy
    Gangamohan, P.
    Gangashetty, Suryakanth, V
    Alku, Paavo
    Yegnanarayana, B.
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2020, 39 (09) : 4459 - 4481
  • [6] Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference
    Sudarsana Reddy Kadiri
    P. Gangamohan
    Suryakanth V. Gangashetty
    Paavo Alku
    B. Yegnanarayana
    [J]. Circuits, Systems, and Signal Processing, 2020, 39 : 4459 - 4481
  • [7] Informative Speech Features based on Emotion Classes and Gender in Explainable Speech Emotion Recognition
    Yildirim, Huseyin Ediz
    Iren, Deniz
    [J]. 2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2023,
  • [8] A Review on Emotion Recognition using Speech
    Basu, Saikat
    Chakraborty, Jaybrata
    Bag, Arnab
    Aftabuddin, Md.
    [J]. PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2017, : 109 - 114
  • [9] Dimensional Speech Emotion Recognition Review
    Li, Hai-Feng
    Chen, Jing
    Ma, Lin
    Bo, Hong-Jian
    Xu, Cong
    Li, Hong-Wei
    [J]. Ruan Jian Xue Bao/Journal of Software, 2020, 31 (08): : 2465 - 2491
  • [10] An ongoing review of speech emotion recognition
    de Lope, Javier
    Grana, Manuel
    [J]. NEUROCOMPUTING, 2023, 528 : 1 - 11