Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition

被引:15
|
作者
Falahzadeh, Mohammad Reza [1 ]
Farokhi, Fardad [2 ]
Harimi, Ali [3 ]
Sabbaghi-Nadooshan, Reza [1 ]
机构
[1] Islamic Azad Univ, Dept Elect Engn, Cent Tehran Branch, Tehran, Iran
[2] Islamic Azad Univ, Dept Biomed Engn, Cent Tehran Branch, Tehran, Iran
[3] Islamic Azad Univ, Dept Elect Engn, Shahrood Branch, Shahrood, Iran
关键词
Speech emotion recognition; 3D tensor speech representation; Chaogram; Deep convolutional neural network; Gray wolf optimization algorithm; RECONSTRUCTED PHASE-SPACE; SPECTRAL FEATURES; CLASSIFICATION; CNN;
D O I
10.1007/s00034-022-02130-3
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech emotion recognition (SER), an important method of emotional human-machine interaction, has been the focus of much research in recent years. Motivated by powerful Deep Convolutional Neural Networks (DCNNs) to learn features and the landmark success of these networks in the field of image classification, the present study aimed to prepare a pre-trained DCNN model for SER and provide compatible input to these networks by converting a speech signal into a 3D tensor. First, using a reconstructed phase space, speech samples are reconstructed in a 3D phase space. Studies have shown that the patterns formed in this space contain meaningful emotional features of the speaker. To provide an input that is compatible with DCNN, a new speech signal representation called Chaogram was introduced as the projection of these patterns, and three channels similar to RGB images were obtained. In the next step, image enhancement techniques were used to highlight the details of Chaogram images. Then, the Visual Geometry Group (VGG) DCNN pre-trained on the large ImageNet dataset is utilized to learn Chaogram high-level features and corresponding emotion classes. Finally, transfer learning is performed on the proposed model, and the presented model is fine-tuned on our datasets. To optimize the hyper-parameter arrangement of architecture-determined CNNs, an innovative DCNN-GWO (gray wolf optimization) is also presented. The results of this study on two public datasets of emotions, i.e., EMO-DB and eNTERFACE05, show the promising performance of the proposed model, which can greatly improve SER applications.
引用
收藏
页码:449 / 492
页数:44
相关论文
共 50 条
  • [1] Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition
    Mohammad Reza Falahzadeh
    Fardad Farokhi
    Ali Harimi
    Reza Sabbaghi-Nadooshan
    [J]. Circuits, Systems, and Signal Processing, 2023, 42 : 449 - 492
  • [2] Improvement of Speech Emotion Recognition by Deep Convolutional Neural Network and Speech Features
    Mohanty, Aniruddha
    Cherukuri, Ravindranath C.
    Prusty, Alok Ranjan
    [J]. THIRD CONGRESS ON INTELLIGENT SYSTEMS, CIS 2022, VOL 1, 2023, 608 : 117 - 129
  • [3] Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network
    Farooq, Misbah
    Hussain, Fawad
    Baloch, Naveed Khan
    Raja, Fawad Riasat
    Yu, Heejung
    Zikria, Yousaf Bin
    [J]. SENSORS, 2020, 20 (21) : 1 - 18
  • [4] Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network
    Badshah, Abdul Malik
    Ahmad, Jamil
    Rahim, Nasir
    Baik, Sung Wook
    [J]. 2017 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON), 2017, : 125 - 129
  • [5] Automated Facial Emotion Recognition Using the Pelican Optimization Algorithm with a Deep Convolutional Neural Network
    Alonazi, Mohammed
    Alshahrani, Hala J.
    Alotaibi, Faiz Abdullah
    Maray, Mohammed
    Alghamdi, Mohammed
    Sayed, Ahmed
    [J]. ELECTRONICS, 2023, 12 (22)
  • [6] Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network
    Bhangale, Kishor
    Kothandaraman, Mohanaprasad
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (04) : 2341 - 2384
  • [7] Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network
    Kishor Bhangale
    Mohanaprasad Kothandaraman
    [J]. Circuits, Systems, and Signal Processing, 2024, 43 : 2341 - 2384
  • [8] Speech emotion recognition with deep convolutional neural networks
    Issa, Dias
    Demirci, M. Fatih
    Yazici, Adnan
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
  • [9] Design of a Convolutional Neural Network for Speech Emotion Recognition
    Lee, Kyong Hee
    Kim, Do Hyun
    [J]. 11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1332 - 1335
  • [10] CONVOLUTIONAL NEURAL NETWORK TECHNIQUES FOR SPEECH EMOTION RECOGNITION
    Parthasarathy, Srinivas
    Tashev, Ivan
    [J]. 2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 121 - 125