Speech emotion recognition using machine learning - A systematic review

被引:8
|
作者
Madanian, Samaneh [1 ]
Chen, Talen [1 ]
Adeleye, Olayinka [1 ]
Templeton, John Michael [2 ]
Poellabauer, Christian [3 ]
Parry, Dave [4 ]
Schneidere, Sandra L. [5 ]
机构
[1] Auckland Univ Technol AUT, Dept Comp Sci & Software Engn, Auckland, New Zealand
[2] Univ S Florida, Dept Comp Sci & Engn, Tampa, FL USA
[3] Florida Int Univ, Sch Comp & Informat Sci, Miami, FL USA
[4] Murdoch Univ, Sch IT Media & Commun, Perth 6150, Australia
[5] St Marys Coll, Dept Commun Sci & Disorders, Notre Dame, IN USA
来源
关键词
Speech emotion recognition; Machine learning; Speaker-independent experiment; Classification; Audio emotion recognition; CONVOLUTIONAL NEURAL-NETWORKS; FEATURES; REPRESENTATIONS; CLASSIFICATION; CLASSIFIERS; COMBINATION; EXTRACTION; DATABASES; RECURRENT;
D O I
10.1016/j.iswa.2023.200266
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech emotion recognition (SER) as a Machine Learning (ML) problem continues to garner a significant amount of research interest, especially in the affective computing domain. This is due to its increasing potential, algorithmic advancements, and applications in real-world scenarios. Human speech contains para-linguistic information that can be represented using quantitative features such as pitch, , intensity, , and Mel-Frequency Cepstral Coefficients (MFCC). SER is commonly achieved following three key steps: data processing, , feature selection/extraction, , and classification based on the underlying emotional features. The nature of these steps, coupled with the distinct features of human speech, underpin the use of ML methods for SER implementation. Recent research works in affective computing employed various ML methods for SER tasks; however, only a few of them capture the underlying techniques and methods that can be used to facilitate the three core steps of SER implementation. In addition, the challenges associated with these steps, and the state-of-the-art approaches used in tackling them are either ignored or sparsely discussed in these works. In this paper, present a systematic review of research that addressed SER tasks from ML perspectives over the last decade, with emphasis on the three SER implementation steps. Different challenges, including the issue of low-classification accuracy of Speaker-Independent experiments, and solutions associated with them, are discussed in detail. The review also provides guidelines for SER evaluation with a focus on common baselines, and metrics available experimentation. This paper is expected to serve as a comprehensive guideline for SER researchers to design SER solutions using ML techniques, motivate possible improvements of existing SER models, or trigger novel techniques to enhance SER performance.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Emotion Recognition On Speech Signals Using Machine Learning
    Ghai, Mohan
    Lal, Shamit
    Duggal, Shivam
    Manik, Shrey
    [J]. PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS AND COMPUTATIONAL INTELLIGENCE (ICBDAC), 2017, : 34 - 39
  • [2] Speech based Emotion Recognition using Machine Learning
    Deshmukh, Girija
    Gaonkar, Apurva
    Golwalkar, Gauri
    Kulkarni, Sukanya
    [J]. PROCEEDINGS OF THE 2019 3RD INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2019), 2019, : 812 - 817
  • [3] Speech Emotion Recognition Using Machine Learning: A Comparative Analysis
    Nath S.
    Shahi A.K.
    Martin T.
    Choudhury N.
    Mandal R.
    [J]. SN Computer Science, 5 (4)
  • [4] Distinctive Approach for Speech Emotion Recognition Using Machine Learning
    Singh, Yogyata
    Neetu
    Rani, Shikha
    [J]. MACHINE LEARNING, IMAGE PROCESSING, NETWORK SECURITY AND DATA SCIENCES, MIND 2022, PT I, 2022, 1762 : 39 - 51
  • [5] Speech emotion recognition of Hindi speech using statistical and machine learning techniques
    Agrawal, Akshat
    Jain, Anurag
    [J]. JOURNAL OF INTERDISCIPLINARY MATHEMATICS, 2020, 23 (01) : 311 - 319
  • [6] Machine Learning Approach for Emotion Recognition in Speech
    Gjoreski, Martin
    Gjoreski, Hristijan
    Kulakov, Andrea
    [J]. INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2014, 38 (04): : 377 - 383
  • [7] Speech Emotion Recognition Using Deep Learning Techniques: A Review
    Khalil, Ruhul Amin
    Jones, Edward
    Babar, Mohammad Inayatullah
    Jan, Tariqullah
    Zafar, Mohammad Haseeb
    Alhussain, Thamer
    [J]. IEEE ACCESS, 2019, 7 : 117327 - 117345
  • [8] Speech emotion recognition approaches: A systematic review
    Hashem, Ahlam
    Arif, Muhammad
    Alghamdi, Manal
    [J]. SPEECH COMMUNICATION, 2023, 154
  • [9] A Review on Emotion Recognition using Speech
    Basu, Saikat
    Chakraborty, Jaybrata
    Bag, Arnab
    Aftabuddin, Md.
    [J]. PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2017, : 109 - 114
  • [10] Applying Machine Learning Techniques for Speech Emotion Recognition
    Tarunika, K.
    Pradeeba, R. B.
    Aruna, P.
    [J]. 2018 9TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2018,