Speech Emotion Recognition through Hybrid Features and Convolutional Neural Network

被引:14
|
作者
Alluhaidan, Ala Saleh [1 ]
Saidani, Oumaima [1 ]
Jahangir, Rashid [2 ]
Nauman, Muhammad Asif [3 ]
Neffati, Omnia Saidani [4 ]
机构
[1] Princess Nourah bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Informat Syst, POB 84428, Riyadh 11671, Saudi Arabia
[2] COMSATS Univ Islamabad, Dept Comp Sci, Vehari Campus, Islamabad 61100, Pakistan
[3] Univ Engn & Technol, Dept Comp Sci, Lahore 54890, Pakistan
[4] King Khalid Univ, Coll Sci & Arts Sarat Abida, Comp Sci Dept, Abha 64734, Saudi Arabia
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 08期
关键词
speech emotion recognition; feature fusion; MFCCs; time-domain; convolutional neural networks; DEEP; CLASSIFICATION;
D O I
10.3390/app13084750
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Speech emotion recognition (SER) is the process of predicting human emotions from audio signals using artificial intelligence (AI) techniques. SER technologies have a wide range of applications in areas such as psychology, medicine, education, and entertainment. Extracting relevant features from audio signals is a crucial task in the SER process to correctly identify emotions. Several studies on SER have employed short-time features such as Mel frequency cepstral coefficients (MFCCs), due to their efficiency in capturing the periodic nature of audio signals. However, these features are limited in their ability to correctly identify emotion representations. To solve this issue, this research combined MFCCs and time-domain features (MFCCT) to enhance the performance of SER systems. The proposed hybrid features were given to a convolutional neural network (CNN) to build the SER model. The hybrid MFCCT features together with CNN outperformed both MFCCs and time-domain (t-domain) features on the Emo-DB, SAVEE, and RAVDESS datasets by achieving an accuracy of 97%, 93%, and 92% respectively. Additionally, CNN achieved better performance compared to the machine learning (ML) classifiers that were recently used in SER. The proposed features have the potential to be widely utilized to several types of SER datasets for identifying emotions.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Improvement of Speech Emotion Recognition by Deep Convolutional Neural Network and Speech Features
    Mohanty, Aniruddha
    Cherukuri, Ravindranath C.
    Prusty, Alok Ranjan
    [J]. THIRD CONGRESS ON INTELLIGENT SYSTEMS, CIS 2022, VOL 1, 2023, 608 : 117 - 129
  • [2] Convolutional Neural Network with Spectrogram and Perceptual Features for Speech Emotion Recognition
    Zhang, Linjuan
    Wang, Longbiao
    Dang, Jianwu
    Guo, Lili
    Guan, Haotian
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2018), PT IV, 2018, 11304 : 62 - 71
  • [3] Speech Emotion Recognition of Merged Features Based on Improved Convolutional Neural Network
    Peng, Wangyue
    Tang, Xiaoyu
    [J]. 2019 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP), 2019, : 301 - 305
  • [4] Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition
    Jiang, Pengxu
    Fu, Hongliang
    Tao, Huawei
    Lei, Peizhi
    Zhao, Li
    [J]. IEEE ACCESS, 2019, 7 : 90368 - 90377
  • [5] Design of a Convolutional Neural Network for Speech Emotion Recognition
    Lee, Kyong Hee
    Kim, Do Hyun
    [J]. 11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1332 - 1335
  • [6] CONVOLUTIONAL NEURAL NETWORK TECHNIQUES FOR SPEECH EMOTION RECOGNITION
    Parthasarathy, Srinivas
    Tashev, Ivan
    [J]. 2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 121 - 125
  • [7] Speech Emotion Recognition with Hybrid Neural Network
    Wei, Chuanzheng
    Sun, Xiao
    Tian, Fang
    Ren, Fuji
    [J]. 5TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM 2019), 2019, : 298 - 302
  • [8] Speech Emotion Recognition Based on Multiple Acoustic Features and Deep Convolutional Neural Network
    Bhangale, Kishor
    Kothandaraman, Mohanaprasad
    [J]. ELECTRONICS, 2023, 12 (04)
  • [9] Speech Emotion Recognition based on Interactive Convolutional Neural Network
    Cheng, Huihui
    Tang, Xiaoyu
    [J]. 2020 IEEE 3RD INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP 2020), 2020, : 163 - 167
  • [10] Deep and shallow features fusion based on deep convolutional neural network for speech emotion recognition
    Sun L.
    Chen J.
    Xie K.
    Gu T.
    [J]. International Journal of Speech Technology, 2018, 21 (4) : 931 - 940