Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition

被引：15

作者：

Falahzadeh, Mohammad Reza ^{[1
]}

Farokhi, Fardad ^{[2
]}

Harimi, Ali ^{[3
]}

Sabbaghi-Nadooshan, Reza ^{[1
]}

机构：

[1] Islamic Azad Univ, Dept Elect Engn, Cent Tehran Branch, Tehran, Iran

[2] Islamic Azad Univ, Dept Biomed Engn, Cent Tehran Branch, Tehran, Iran

[3] Islamic Azad Univ, Dept Elect Engn, Shahrood Branch, Shahrood, Iran

来源：

CIRCUITS SYSTEMS AND SIGNAL PROCESSING | 2023年 / 42卷 / 01期

关键词：

Speech emotion recognition; 3D tensor speech representation; Chaogram; Deep convolutional neural network; Gray wolf optimization algorithm; RECONSTRUCTED PHASE-SPACE; SPECTRAL FEATURES; CLASSIFICATION; CNN;

D O I：

10.1007/s00034-022-02130-3

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Speech emotion recognition (SER), an important method of emotional human-machine interaction, has been the focus of much research in recent years. Motivated by powerful Deep Convolutional Neural Networks (DCNNs) to learn features and the landmark success of these networks in the field of image classification, the present study aimed to prepare a pre-trained DCNN model for SER and provide compatible input to these networks by converting a speech signal into a 3D tensor. First, using a reconstructed phase space, speech samples are reconstructed in a 3D phase space. Studies have shown that the patterns formed in this space contain meaningful emotional features of the speaker. To provide an input that is compatible with DCNN, a new speech signal representation called Chaogram was introduced as the projection of these patterns, and three channels similar to RGB images were obtained. In the next step, image enhancement techniques were used to highlight the details of Chaogram images. Then, the Visual Geometry Group (VGG) DCNN pre-trained on the large ImageNet dataset is utilized to learn Chaogram high-level features and corresponding emotion classes. Finally, transfer learning is performed on the proposed model, and the presented model is fine-tuned on our datasets. To optimize the hyper-parameter arrangement of architecture-determined CNNs, an innovative DCNN-GWO (gray wolf optimization) is also presented. The results of this study on two public datasets of emotions, i.e., EMO-DB and eNTERFACE05, show the promising performance of the proposed model, which can greatly improve SER applications.

引用

页码：449 / 492

页数：44

共 50 条

[1] Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition
Mohammad Reza Falahzadeh
Fardad Farokhi
Ali Harimi
Reza Sabbaghi-Nadooshan
[J]. Circuits, Systems, and Signal Processing, 2023, 42 : 449 - 492
[2] Improvement of Speech Emotion Recognition by Deep Convolutional Neural Network and Speech Features
Mohanty, Aniruddha
Cherukuri, Ravindranath C.
Prusty, Alok Ranjan
[J]. THIRD CONGRESS ON INTELLIGENT SYSTEMS, CIS 2022, VOL 1, 2023, 608 : 117 - 129
[3] Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network
Farooq, Misbah
Hussain, Fawad
Baloch, Naveed Khan
Raja, Fawad Riasat
Yu, Heejung
Zikria, Yousaf Bin
[J]. SENSORS, 2020, 20 (21) : 1 - 18
[4] Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network
Badshah, Abdul Malik
Ahmad, Jamil
Rahim, Nasir
Baik, Sung Wook
[J]. 2017 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON), 2017, : 125 - 129
[5] Automated Facial Emotion Recognition Using the Pelican Optimization Algorithm with a Deep Convolutional Neural Network
Alonazi, Mohammed
Alshahrani, Hala J.
Alotaibi, Faiz Abdullah
Maray, Mohammed
Alghamdi, Mohammed
Sayed, Ahmed
[J]. ELECTRONICS, 2023, 12 (22)
[6] Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network
Bhangale, Kishor
Kothandaraman, Mohanaprasad
[J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (04) : 2341 - 2384
[7] Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network
Kishor Bhangale
Mohanaprasad Kothandaraman
[J]. Circuits, Systems, and Signal Processing, 2024, 43 : 2341 - 2384
[8] Speech emotion recognition with deep convolutional neural networks
Issa, Dias
Demirci, M. Fatih
Yazici, Adnan
[J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
[9] Design of a Convolutional Neural Network for Speech Emotion Recognition
Lee, Kyong Hee
Kim, Do Hyun
[J]. 11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1332 - 1335
[10] CONVOLUTIONAL NEURAL NETWORK TECHNIQUES FOR SPEECH EMOTION RECOGNITION
Parthasarathy, Srinivas
Tashev, Ivan
[J]. 2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 121 - 125

← 1 2 3 4 5 →