Egyptian Arabic speech emotion recognition using prosodic, spectral and wavelet features

被引:36
|
作者
Abdel-Hamid, Lamiaa [1 ]
机构
[1] Misr Int Univ, Fac Engn, Dept Elect & Commun, Cairo, Egypt
关键词
Speech emotion recognition; Arabic speech emotion database; Prosodic features; Mel-frequency cepstral coefficients (MFCC); Long-term average spectrum (LTAS); Wavelet transform; DATABASE; MFCC;
D O I
10.1016/j.specom.2020.04.005
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech emotion recognition (SER) has recently been receiving increased interest due to the rapid advancements in affective computing and human computer interaction. English, German, Mandarin and Indian are among the most commonly considered languages for SER along with other European and Asian languages. However, few researches have implemented Arabic SER systems due to the scarcity of available Arabic speech emotion databases. Although Egyptian Arabic is considered one of the most widely spoken and understood Arabic dialects in the Middle East, no Egyptian Arabic speech emotion database has yet been devised. In this work, a semi-natural Egyptian Arabic speech emotion (EYASE) database is introduced that has been created from an award winning Egyptian TV series. The EYASE database includes utterances from 3 male and 3 female professional actors considering four emotions: angry, happy, neutral and sad. Prosodic, spectral and wavelet features are computed from the EYASE database for emotion recognition. In addition to the classical pitch, intensity, formants and Mel-frequency cepstral coefficients (MFCC) widely implemented for SER, long-term average spectrum (LTAS) and wavelet parameters are also considered in this work. Speaker independent and speaker dependent experiments were performed for three different cases: (1) emotion vs. neutral classifications, (2) arousal and valence classifications and (3) multi emotion classifications. Several analysis were made to explore different aspects related to Arabic SER including the effect of gender and culture on SER. Furthermore, feature ranking was performed to evaluate the relevance of the LTAS and wavelet features for SER, in comparison to the more widely used prosodic and spectral features. Moreover, anger detection performance is compared for different combinations of the implemented prosodic, spectral and wavelet features. Feature ranking and anger detection performance analysis showed that both LTAS and wavelet features were relevant for Arabic SER and that they significantly improved emotion recognition rates.
引用
收藏
页码:19 / 30
页数:12
相关论文
共 50 条
  • [1] Improving Speech Emotion Recognition System Using Spectral and Prosodic Features
    Chakhtouna, Adil
    Sekkate, Sara
    Adib, Abdellah
    [J]. INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, ISDA 2021, 2022, 418 : 399 - 409
  • [2] Emotion recognition from speech using wavelet packet transform and prosodic features
    Gupta, Manish
    Bharti, Shambhu Shankar
    Agarwal, Suneeta
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 35 (02) : 1541 - 1553
  • [3] Emotion Recognition Using Prosodic and Spectral Features of Speech and Naive Bayes Classifier
    Khan, Atreyee
    Roy, Uttam Kumar
    [J]. 2017 2ND IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2017, : 1017 - 1021
  • [4] Hierarchical emotion recognition from speech using source, power spectral and prosodic features
    Arijul Haque
    K. Sreenivasa Rao
    [J]. Multimedia Tools and Applications, 2024, 83 : 19629 - 19661
  • [5] Hierarchical emotion recognition from speech using source, power spectral and prosodic features
    Haque, Arijul
    Rao, K. Sreenivasa
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (07) : 19629 - 19661
  • [6] PERFORMANCE ANALYSIS OF SPECTRAL AND PROSODIC FEATURES AND THEIR FUSION FOR EMOTION RECOGNITION IN SPEECH
    Gaurav, Manish
    [J]. 2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 313 - 316
  • [7] A Hybrid Speech Emotion Recognition System Based on Spectral and Prosodic Features
    Zhou, Yu
    Li, Junfeng
    Sun, Yanqing
    Zhang, Jianping
    Yan, Yonghong
    Akagi, Masato
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (10) : 2813 - 2821
  • [8] Emotion Recognition from Speech using Prosodic and Linguistic Features
    Pervaiz, Mahwish
    Khan, Tamim Ahmed
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (08) : 84 - 90
  • [9] Emotion Recognition in Speech Using MFCC and Wavelet Features
    Kishore, K. V. Krishna
    Satish, P. Krishna
    [J]. PROCEEDINGS OF THE 2013 3RD IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2013, : 842 - 847
  • [10] Emotion recognition from speech using global and local prosodic features
    Rao K.S.
    Koolagudi S.G.
    Vempada R.R.
    [J]. International Journal of Speech Technology, 2013, 16 (2) : 143 - 160