Bangla Speech Emotion Recognition and Cross-Lingual Study Using Deep CNN and BLSTM Networks

被引：21

作者：

Sultana, Sadia ^{[1
]}

Iqbal, M. Zafar ^{[1
]}

Selim, M. Reza ^{[1
]}

Rashid, Md. Mijanur ^{[2
]}

Rahman, M. Shahidur ^{[1
]}

机构：

[1] Shahjalal Univ Sci & Technol, Dept Comp Sci & Engn, Sylhet 3114, Bangladesh

[2] Accenture, REPL Grp, Henley In Arden B95 5QR, England

来源：

IEEE ACCESS | 2022年 / 10卷

关键词：

Convolutional neural networks; Feature extraction; Spectrogram; Emotion recognition; Training; Speech recognition; Convolution; Bangla SER; deep CNN; RAVDESS; SUBESCO; time-distributed flatten; SHORT-TERM-MEMORY; CONVOLUTIONAL NEURAL-NETWORKS; RECURRENT; FEATURES; REPRESENTATION; ARCHITECTURES; IMPACT;

D O I：

10.1109/ACCESS.2021.3136251

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this study, we have presented a deep learning-based implementation for speech emotion recognition (SER). The system combines a deep convolutional neural network (DCNN) and a bidirectional long-short term memory (BLSTM) network with a time-distributed flatten (TDF) layer. The proposed model has been applied for the recently built audio-only Bangla emotional speech corpus SUBESCO. A series of experiments were carried out to analyze all the models discussed in this paper for baseline, cross-lingual, and multilingual training-testing setups. The experimental results reveal that the model with a TDF layer achieves better performance compared with other state-of-the-art CNN-based SER models which can work on both temporal and sequential representation of emotions. For the cross-lingual experiments, cross-corpus training, multi-corpus training, and transfer learning were employed for the Bangla and English languages using the SUBESCO and RAVDESS datasets. The proposed model has attained a state-of-the-art perceptual efficiency achieving weighted accuracies (WAs) of 86.9%, and 82.7% for the SUBESCO and RAVDESS datasets, respectively.

引用

页码：564 / 578

页数：15

共 50 条

[1] Advancements in Bangla Speech Emotion Recognition: A Deep Learning Approach with Cross-Lingual Validation
Alam, Khorshed
Bhuiyan, Mahbubul Haq
Hossain, Md Junayed
Monir, Md Fahad
Khaled, Md Asif Bin
[J]. IEEE Vehicular Technology Conference, 2024,
[2] Speech Emotion Recognition with Cross-lingual Databases
Chiou, Bo-Chang
Chen, Chia-Ping
[J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 558 - 561
[3] UNSUPERVISED CROSS-LINGUAL SPEECH EMOTION RECOGNITION USING PSEUDO MULTILABEL
Li, Fin
Yan, Nan
Wang, Lan
[J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 366 - 373
[4] Speech Emotion Recognition using XGBoost and CNN BLSTM with Attention
He, Jingru
Ren, Liyong
[J]. 2021 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI 2021), 2021, : 154 - 159
[5] CROSS-LINGUAL AND MULTILINGUAL SPEECH EMOTION RECOGNITION ON ENGLISH AND FRENCH
Neumann, Michael
Ngoc Thang Vu
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5769 - 5773
[6] Cross-lingual Speech Emotion Recognition through Factor Analysis
Desplanques, Brecht
Demuynck, Kris
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3648 - 3652
[7] Semi-supervised cross-lingual speech emotion recognition
Agarla, Mirko
Bianco, Simone
Celona, Luigi
Napoletano, Paolo
Petrovsky, Alexey
Piccoli, Flavio
Schettini, Raimondo
Shanin, Ivan
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237
[8] BLSTM and CNN Stacking Architecture for Speech Emotion Recognition
Dongdong Li
Linyu Sun
Xinlei Xu
Zhe Wang
Jing Zhang
Wenli Du
[J]. Neural Processing Letters, 2021, 53 : 4097 - 4115
[9] BLSTM and CNN Stacking Architecture for Speech Emotion Recognition
Li, Dongdong
Sun, Linyu
Xu, Xinlei
Wang, Zhe
Zhang, Jing
Du, Wenli
[J]. NEURAL PROCESSING LETTERS, 2021, 53 (06) : 4097 - 4115
[10] Unsupervised Cross-Lingual Speech Emotion Recognition Using Domain Adversarial Neural Network
Cai, Xiong
Wu, Zhiyong
Zhong, Kuo
Su, Bin
Dai, Dongyang
Meng, Helen
[J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,

← 1 2 3 4 5 →