Advancements in Bangla Speech Emotion Recognition: A Deep Learning Approach with Cross-Lingual Validation

被引:0
|
作者
Alam, Khorshed [1 ]
Bhuiyan, Mahbubul Haq [1 ]
Hossain, Md Junayed [1 ]
Monir, Md Fahad [1 ]
Bin Khaled, Md Asif [1 ]
机构
[1] Independent Univ Bangladesh IUB, Dept Comp Sci & Engn, Dhaka, Bangladesh
关键词
Bangla Speech Emotion Recognition (SER); Deep Neural Networks (DNN); Feature Extraction; Cross-Lingual Validation; Emotion Recognition; Data Augmentation; Spectrogram Analysis; Speech Processing;
D O I
10.1109/VTC2024-SPRING62846.2024.10683404
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech Emotion Recognition (SER) is a method where computers learn to recognize human emotions from speech to improve communication. In this study, we present an innovative Bangla SER framework, incorporating data augmentations, feature extractions, and a deep learning model. We utilize a publicly available Bangla SER dataset named SUBESCO, developed by Shahjalal University of Science and Technology (SUST), which stands as one of the largest datasets in the Bangla SER domain. To enhance the data's amount and variety, we use noise injection, time stretching, time shifting, and pitch augmentation. Previous Bangla SER models have struggled with severe misclassification in lower tonal classes, highlighting the need for more robust feature extraction. Therefore, we have conducted feature extraction, including Zero Crossing Rate (ZCR), Chroma stft, Mel-frequency Cepstrum Coefficient (MFCC), Root Mean Square, and Mel Spectrogram, to improve classification accuracy. This paper aims to use a Deep Neural Network (DNN) for recognizing emotion labels from Bangla spoken speech data and has achieved 99% accuracy on unseen data. Given the scarcity of Bangla SER datasets and to validate the model's robustness, we tested its performance on additional lingual datasets, namely RAVDESS, TESS, EMOVO, EmoDB, and BanglaSER. Remarkably, our model demonstrated high accuracy rates of 90%, 99%, 96%, 95%, and 93% on these datasets, respectively.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Advancements in Bangla Speech Emotion Recognition: A Deep Learning Approach with Cross-Lingual Validation
    Alam, Khorshed
    Bhuiyan, Mahbubul Haq
    Hossain, Md Junayed
    Monir, Md Fahad
    Khaled, Md Asif Bin
    IEEE Vehicular Technology Conference, 2024,
  • [2] Bangla Speech Emotion Recognition and Cross-Lingual Study Using Deep CNN and BLSTM Networks
    Sultana, Sadia
    Iqbal, M. Zafar
    Selim, M. Reza
    Rashid, Md. Mijanur
    Rahman, M. Shahidur
    IEEE ACCESS, 2022, 10 : 564 - 578
  • [3] Cross-Lingual Transfert Learning for Speech Emotion Recognition
    Baklouti, Imen
    Ben Ahmed, Olfa
    Baklouti, Raoudha
    Fernandez, Christine
    2024 IEEE 7TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES, SIGNAL AND IMAGE PROCESSING, ATSIP 2024, 2024, : 559 - 563
  • [4] Speech Emotion Recognition with Cross-lingual Databases
    Chiou, Bo-Chang
    Chen, Chia-Ping
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 558 - 561
  • [5] Cross-lingual Speech Emotion Recognition through Factor Analysis
    Desplanques, Brecht
    Demuynck, Kris
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3648 - 3652
  • [6] CROSS-LINGUAL AND MULTILINGUAL SPEECH EMOTION RECOGNITION ON ENGLISH AND FRENCH
    Neumann, Michael
    Ngoc Thang Vu
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5769 - 5773
  • [7] Semi-supervised cross-lingual speech emotion recognition
    Agarla, Mirko
    Bianco, Simone
    Celona, Luigi
    Napoletano, Paolo
    Petrovsky, Alexey
    Piccoli, Flavio
    Schettini, Raimondo
    Shanin, Ivan
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237
  • [8] Unsupervised Cross-lingual Representation Learning for Speech Recognition
    Conneau, Alexis
    Baevski, Alexei
    Collobert, Ronan
    Mohamed, Abdelrahman
    Auli, Michael
    INTERSPEECH 2021, 2021, : 2426 - 2430
  • [9] UNSUPERVISED CROSS-LINGUAL SPEECH EMOTION RECOGNITION USING PSEUDO MULTILABEL
    Li, Fin
    Yan, Nan
    Wang, Lan
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 366 - 373
  • [10] Multilingual, Cross-lingual, and Monolingual Speech Emotion Recognition on EmoFilm Dataset
    Atmaja, Bagus Tris
    Sasou, Akira
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1019 - 1025