A review on emotion recognition from dialect speech using feature optimization and classification techniques

被引:1
|
作者
Thimmaiah, Sunil [1 ]
Vinay, N. A. [3 ]
Ravikumar, M. G. [1 ]
Prasad, S. R. [2 ]
机构
[1] Nagarjuna Coll Engn & Technol, Bengaluru, India
[2] KNS Inst Technol, Bengaluru, India
[3] Dayananda Sagar Coll Engn, Bengaluru, India
关键词
Emotion recognition; Dialect speech; Feature optimization; Classification techniques; Acoustic cues; Spectral features; Prosodic features; Temporal features; Support vector machines; Gaussian mixture models; Hidden Markov models; Machine learning; Convolutional neural networks; Long short-term memory networks; Feature selection; Dimensionality reduction; Principal component analysis; Recursive feature elimination; Datasets; Model generalization;
D O I
10.1007/s11042-024-18297-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion recognition from speech has gained prominence across various domains due to its wide-ranging applications. This paper presents a comprehensive review of advancements in emotion recognition, focusing on dialect speech, through the utilization of feature optimization and classification techniques. Dialectal variations in speech introduce complexities that impact the accuracy of emotion recognition models. To address this challenge, diverse feature extraction methods have been explored, capturing both general and dialect-specific acoustic cues. Spectral, prosodic, and temporal features are adapted and optimized to enhance emotional content representation within dialect speech. Classification techniques play a pivotal role in distinguishing emotions in dialect speech. Traditional classifiers like Support Vector Machines (SVMs), Gaussian Mixture Models (GMMs), and Hidden Markov Models (HMMs) have been employed. Recent studies highlight the efficacy of machine learning approaches such as Random Forests, Gradient Boosting, Convolutional Neural Networks (CNNs), and Long Short-Term Memory networks (LSTMs). Feature selection and dimensionality reduction techniques optimize model performance. Principal Component Analysis (PCA), Recursive Feature Elimination (RFE), and genetic algorithms enhance feature sets, improving classification accuracy and computational efficiency. Datasets tailored for dialect-specific speech corpora address linguistic nuances and contribute to the model's relevance to distinct regions or communities. Challenges include limited labelled dialect emotion datasets, model generalization across multiple dialects, and ethical considerations. As the field evolves, striking a balance between performance and ethics remains imperative. This review underscores the promise of optimized feature extraction, innovative classification techniques, and tailored datasets in dialect-based emotion recognition.
引用
收藏
页码:73793 / 73793
页数:34
相关论文
共 50 条
  • [1] A Review of Feature Extraction and Classification Techniques in Speech Recognition
    Yadav S.
    Kumar A.
    Yaduvanshi A.
    Meena P.
    [J]. SN Computer Science, 4 (6)
  • [2] Emotion Recognition Using Multi-parameter Speech Feature Classification
    Poorna, S. S.
    Jeevitha, C. Y.
    Nair, Shyama Jayan
    Santhosh, Sini
    Nair, G. J.
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTERS, COMMUNICATIONS, AND SYSTEMS (ICCCS), 2015, : 217 - 222
  • [3] Speech emotion recognition using semi-NMF feature optimization
    Bandela, Surekha Reddy
    Kumar, T. Kishore
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (05) : 3741 - 3757
  • [4] Speech Emotion Recognition Using Deep Learning Techniques: A Review
    Khalil, Ruhul Amin
    Jones, Edward
    Babar, Mohammad Inayatullah
    Jan, Tariqullah
    Zafar, Mohammad Haseeb
    Alhussain, Thamer
    [J]. IEEE ACCESS, 2019, 7 : 117327 - 117345
  • [5] Speech emotion recognition using emotion perception spectral feature
    Jiang, Lin
    Tan, Ping
    Yang, Junfeng
    Liu, Xingbao
    Wang, Chao
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (11):
  • [6] Arabic Speech Emotion Recognition From Saudi Dialect Corpus
    Aljuhani, Reem Hamed
    Alshutayri, Areej
    Alahdal, Shahd
    [J]. IEEE ACCESS, 2021, 9 : 127081 - 127085
  • [7] Language dialect based speech emotion recognition through deep learning techniques
    Sukumar Rajendran
    Sandeep Kumar Mathivanan
    Prabhu Jayagopal
    Maheshwari Venkatasen
    Thanapal Pandi
    Manivannan Sorakaya Somanathan
    Muthamilselvan Thangaval
    Prasanna Mani
    [J]. International Journal of Speech Technology, 2021, 24 : 625 - 635
  • [8] Language dialect based speech emotion recognition through deep learning techniques
    Rajendran, Sukumar
    Mathivanan, Sandeep Kumar
    Jayagopal, Prabhu
    Venkatasen, Maheshwari
    Pandi, Thanapal
    Sorakaya Somanathan, Manivannan
    Thangaval, Muthamilselvan
    Mani, Prasanna
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (03) : 625 - 635
  • [9] Multimodal speech emotion recognition and classification using convolutional neural network techniques
    Christy, A.
    Vaithyasubramanian, S.
    Jesudoss, A.
    Praveena, M. D. Anto
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (02) : 381 - 388
  • [10] Multimodal speech emotion recognition and classification using convolutional neural network techniques
    A. Christy
    S. Vaithyasubramanian
    A. Jesudoss
    M. D. Anto Praveena
    [J]. International Journal of Speech Technology, 2020, 23 : 381 - 388