Attention guided 3D CNN-LSTM model for accurate speech based emotion recognition

被引：59

作者：

Atila, Orhan ^{[1
]}

Sengur, Abdulkadir ^{[1
]}

机构：

[1] Firat Univ, Dept Elect & Elect Engn, Fac Technol, TR-23119 Elazig, Turkey

来源：

APPLIED ACOUSTICS | 2021年 / 182卷

关键词：

Speech emotion recognition; Attention; 3D CNN-LSTM model; FRACTAL DIMENSION; FEATURE-SELECTION;

D O I：

10.1016/j.apacoust.2021.108260

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, a novel approach, which is based on attention guided 3D convolutional neural networks (CNN)-long short-term memory (LSTM) model, is proposed for speech based emotion recognition. The proposed attention guided 3D CNN-LSTM model is trained in end-to-end fashion. The input speech signals are initially resampled and pre-processed for noise removing and emphasizing the high frequencies. Then, spectrogram, Mel-frequency cepstral coefficient (MFCC), cochleagram and fractal dimension methods are used to convert the input speech signals into the speech images. The obtained images are concatenated into four-dimensional volumes and used as input to the developed 28 layered attention integrated 3D CNN-LSTM model. In the 3D CNN-LSTM model, there are six 3D convolutional layers, two batch normalization (BN) layers, five Rectified Linear Unit (ReLu) layers, three 3D max pooling layers, one attention, one LSTM, one flatten and one dropout layers, and two fully connected layers. The attention layer is connected to the 3D convolution layers. Three datasets namely Ryerson Audio-Visual Database of Emotional Speech (RAVDESS), RML and SAVEE are used in the experimental works. Besides, the mixture of these datasets is also used in the experimental works. Classification accuracy, sensitivity, specificity and F1-score are used for evaluation of the developed method. The obtained results are also compared with some of the recently published results and it is seen that the proposed method outperforms the compared methods. (C) 2021 Elsevier Ltd. All rights reserved.

引用

页数：11

共 50 条

[1] Ensemble Learning with CNN-LSTM Combination for Speech Emotion Recognition
Tanberk, Senem
Tukel, Dilek Bilgin
[J]. PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION NETWORKS (ICCCN 2021), 2022, 394 : 39 - 47
[2] Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model
Atmaja, Bagus Tris
Akagi, Masato
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), 2019, : 40 - 44
[3] AUV 3D Trajectory Prediction Based on CNN-LSTM
Li, Juan
Li, Wenbo
[J]. PROCEEDINGS OF 2022 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION (IEEE ICMA 2022), 2022, : 1227 - 1232
[4] Non-diacritized Arabic speech recognition based on CNN-LSTM and attention-based models
Alsayadi, Hamzah A.
Abdelhamid, Abdelaziz A.
Hegazy, Islam
Fayed, Zaki T.
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (06) : 6207 - 6219
[5] 3D Gait Recognition Based on a CNN-LSTM Network with the Fusion of SkeGEI and DA Features
Liu, Yu
Jiang, Xinghao
Sun, Tanfeng
Xu, Ke
[J]. 2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,
[6] Combined CNN LSTM with attention for speech emotion recognition based on feature-level fusion
Liu, Yanlin
Chen, Aibin
Zhou, Guoxiong
Yi, Jizheng
Xiang, Jin
Wang, Yaru
[J]. Multimedia Tools and Applications, 2024, 83 (21) : 59839 - 59859
[7] ACCURATE 3D RECONSTRUCTION FROM CIRCULAR LIGHT FIELD USING CNN-LSTM
Song, Zhengxi
Zhu, Hao
Wu, Qi
Wang, Xue
Li, Hongdong
Wang, Qing
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
[8] Vehicle Position Prediction Using Particle Filtering Based on 3D CNN-LSTM Model
Wang, Jiaqin
Liu, Kai
Gong, Yi
[J]. IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (04) : 2992 - 3004
[9] Attention-Based Dense LSTM for Speech Emotion Recognition
Xie, Yue
Liang, Ruiyu
Liang, Zhenlin
Zhao, Li
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (07): : 1426 - 1429
[10] Siamese Attention-Based LSTM for Speech Emotion Recognition
Nizamidin, Tashpolat
Zhao, Li
Liang, Ruiyu
Xie, Yue
Hamdulla, Askar
[J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2020, E103A (07) : 937 - 941

← 1 2 3 4 5 →