Bangla Speech Emotion Recognition and Cross-Lingual Study Using Deep CNN and BLSTM Networks

被引:21
|
作者
Sultana, Sadia [1 ]
Iqbal, M. Zafar [1 ]
Selim, M. Reza [1 ]
Rashid, Md. Mijanur [2 ]
Rahman, M. Shahidur [1 ]
机构
[1] Shahjalal Univ Sci & Technol, Dept Comp Sci & Engn, Sylhet 3114, Bangladesh
[2] Accenture, REPL Grp, Henley In Arden B95 5QR, England
关键词
Convolutional neural networks; Feature extraction; Spectrogram; Emotion recognition; Training; Speech recognition; Convolution; Bangla SER; deep CNN; RAVDESS; SUBESCO; time-distributed flatten; SHORT-TERM-MEMORY; CONVOLUTIONAL NEURAL-NETWORKS; RECURRENT; FEATURES; REPRESENTATION; ARCHITECTURES; IMPACT;
D O I
10.1109/ACCESS.2021.3136251
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this study, we have presented a deep learning-based implementation for speech emotion recognition (SER). The system combines a deep convolutional neural network (DCNN) and a bidirectional long-short term memory (BLSTM) network with a time-distributed flatten (TDF) layer. The proposed model has been applied for the recently built audio-only Bangla emotional speech corpus SUBESCO. A series of experiments were carried out to analyze all the models discussed in this paper for baseline, cross-lingual, and multilingual training-testing setups. The experimental results reveal that the model with a TDF layer achieves better performance compared with other state-of-the-art CNN-based SER models which can work on both temporal and sequential representation of emotions. For the cross-lingual experiments, cross-corpus training, multi-corpus training, and transfer learning were employed for the Bangla and English languages using the SUBESCO and RAVDESS datasets. The proposed model has attained a state-of-the-art perceptual efficiency achieving weighted accuracies (WAs) of 86.9%, and 82.7% for the SUBESCO and RAVDESS datasets, respectively.
引用
收藏
页码:564 / 578
页数:15
相关论文
共 50 条
  • [1] Advancements in Bangla Speech Emotion Recognition: A Deep Learning Approach with Cross-Lingual Validation
    Alam, Khorshed
    Bhuiyan, Mahbubul Haq
    Hossain, Md Junayed
    Monir, Md Fahad
    Khaled, Md Asif Bin
    [J]. IEEE Vehicular Technology Conference, 2024,
  • [2] Speech Emotion Recognition with Cross-lingual Databases
    Chiou, Bo-Chang
    Chen, Chia-Ping
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 558 - 561
  • [3] UNSUPERVISED CROSS-LINGUAL SPEECH EMOTION RECOGNITION USING PSEUDO MULTILABEL
    Li, Fin
    Yan, Nan
    Wang, Lan
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 366 - 373
  • [4] Speech Emotion Recognition using XGBoost and CNN BLSTM with Attention
    He, Jingru
    Ren, Liyong
    [J]. 2021 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI 2021), 2021, : 154 - 159
  • [5] CROSS-LINGUAL AND MULTILINGUAL SPEECH EMOTION RECOGNITION ON ENGLISH AND FRENCH
    Neumann, Michael
    Ngoc Thang Vu
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5769 - 5773
  • [6] Cross-lingual Speech Emotion Recognition through Factor Analysis
    Desplanques, Brecht
    Demuynck, Kris
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3648 - 3652
  • [7] Semi-supervised cross-lingual speech emotion recognition
    Agarla, Mirko
    Bianco, Simone
    Celona, Luigi
    Napoletano, Paolo
    Petrovsky, Alexey
    Piccoli, Flavio
    Schettini, Raimondo
    Shanin, Ivan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237
  • [8] BLSTM and CNN Stacking Architecture for Speech Emotion Recognition
    Dongdong Li
    Linyu Sun
    Xinlei Xu
    Zhe Wang
    Jing Zhang
    Wenli Du
    [J]. Neural Processing Letters, 2021, 53 : 4097 - 4115
  • [9] BLSTM and CNN Stacking Architecture for Speech Emotion Recognition
    Li, Dongdong
    Sun, Linyu
    Xu, Xinlei
    Wang, Zhe
    Zhang, Jing
    Du, Wenli
    [J]. NEURAL PROCESSING LETTERS, 2021, 53 (06) : 4097 - 4115
  • [10] Unsupervised Cross-Lingual Speech Emotion Recognition Using Domain Adversarial Neural Network
    Cai, Xiong
    Wu, Zhiyong
    Zhong, Kuo
    Su, Bin
    Dai, Dongyang
    Meng, Helen
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,