Speech emotion recognition approaches: A systematic review

被引:4
|
作者
Hashem, Ahlam [1 ]
Arif, Muhammad [1 ]
Alghamdi, Manal [1 ]
机构
[1] Umm Al Qura Univ, Dept Comp Sci, Al Abdiyah, Makkah, Saudi Arabia
关键词
Speech emotion recognition; Emotional speech database; Classification of emotion; Speech features; Systematic review; TIME-COURSE; NEURAL-NETWORK; FEATURES; SELECTION; DOMAIN; REPRESENTATIONS; CLASSIFICATION; CLASSIFIERS; INFORMATION; PERFORMANCE;
D O I
10.1016/j.specom.2023.102974
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The speech emotion recognition (SER) field has been active since it became a crucial feature in advanced Human-Computer Interaction (HCI), and wide real-life applications use it. In recent years, numerous SER systems have been covered by researchers, including the availability of appropriate emotional databases, selecting robustness features, and applying suitable classifiers using Machine Learning (ML) and Deep Learning (DL). Deep models proved to perform more accurately for SER than conventional ML techniques. Nevertheless, SER is yet challenging for classification where to separate similar emotional patterns; it needs a highly discriminative feature representation. For this purpose, this survey aims to critically analyze what is being done in this field of research in light of previous studies that aim to recognize emotions using speech audio in different aspects and review the current state of SER using DL. Through a systematic literature review whereby searching selected keywords from 2012-2022, 96 papers were extracted and covered the most current findings and directions. Specifically, we covered the database (acted, evoked, and natural) and features (prosodic, spectral, voice quality, and teager energy operator), the necessary preprocessing steps. Furthermore, different DL models and their performance are examined in depth. Based on our review, we also suggested SER aspects that could be considered in the future.
引用
收藏
页数:29
相关论文
共 50 条
  • [1] A systematic literature review of speech emotion recognition approaches
    Singh, Youddha Beer
    Goel, Shivani
    [J]. NEUROCOMPUTING, 2022, 492 : 245 - 263
  • [2] Unimodal approaches for emotion recognition: A systematic review
    Tomar, Pragya Singh
    Mathur, Kirti
    Suman, Ugrasen
    [J]. COGNITIVE SYSTEMS RESEARCH, 2023, 77 : 94 - 109
  • [3] Automatic Speech Emotion Recognition: a Systematic Literature Review
    Mustafa H.H.
    Darwish N.R.
    Hefny H.A.
    [J]. International Journal of Speech Technology, 2024, 27 (1) : 267 - 285
  • [4] Urdu Speech Emotion Recognition: A Systematic Literature Review
    Taj, Soonh
    Mujtaba, Ghulam
    Daudpota, Sher Muhammad
    Mughal, Muhammad Hussain
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (07)
  • [5] Speech emotion recognition using machine learning - A systematic review
    Madanian, Samaneh
    Chen, Talen
    Adeleye, Olayinka
    Templeton, John Michael
    Poellabauer, Christian
    Parry, Dave
    Schneidere, Sandra L.
    [J]. INTELLIGENT SYSTEMS WITH APPLICATIONS, 2023, 20
  • [6] A Review on Emotion Recognition using Speech
    Basu, Saikat
    Chakraborty, Jaybrata
    Bag, Arnab
    Aftabuddin, Md.
    [J]. PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2017, : 109 - 114
  • [7] Dimensional Speech Emotion Recognition Review
    Li, Hai-Feng
    Chen, Jing
    Ma, Lin
    Bo, Hong-Jian
    Xu, Cong
    Li, Hong-Wei
    [J]. Ruan Jian Xue Bao/Journal of Software, 2020, 31 (08): : 2465 - 2491
  • [8] An ongoing review of speech emotion recognition
    de Lope, Javier
    Grana, Manuel
    [J]. NEUROCOMPUTING, 2023, 528 : 1 - 11
  • [9] Emotion recognition from speech: a review
    Koolagudi, Shashidhar G.
    Rao, K. Sreenivasa
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (02) : 99 - 117
  • [10] Speech emotion recognition approaches in human computer interaction
    S. Ramakrishnan
    Ibrahiem M. M. El Emary
    [J]. Telecommunication Systems, 2013, 52 : 1467 - 1478