Advancements in Bangla Speech Emotion Recognition: A Deep Learning Approach with Cross-Lingual Validation

被引:0
|
作者
Alam, Khorshed [1 ]
Bhuiyan, Mahbubul Haq [1 ]
Hossain, Md Junayed [1 ]
Monir, Md Fahad [1 ]
Bin Khaled, Md Asif [1 ]
机构
[1] Independent Univ Bangladesh IUB, Dept Comp Sci & Engn, Dhaka, Bangladesh
关键词
Bangla Speech Emotion Recognition (SER); Deep Neural Networks (DNN); Feature Extraction; Cross-Lingual Validation; Emotion Recognition; Data Augmentation; Spectrogram Analysis; Speech Processing;
D O I
10.1109/VTC2024-SPRING62846.2024.10683404
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech Emotion Recognition (SER) is a method where computers learn to recognize human emotions from speech to improve communication. In this study, we present an innovative Bangla SER framework, incorporating data augmentations, feature extractions, and a deep learning model. We utilize a publicly available Bangla SER dataset named SUBESCO, developed by Shahjalal University of Science and Technology (SUST), which stands as one of the largest datasets in the Bangla SER domain. To enhance the data's amount and variety, we use noise injection, time stretching, time shifting, and pitch augmentation. Previous Bangla SER models have struggled with severe misclassification in lower tonal classes, highlighting the need for more robust feature extraction. Therefore, we have conducted feature extraction, including Zero Crossing Rate (ZCR), Chroma stft, Mel-frequency Cepstrum Coefficient (MFCC), Root Mean Square, and Mel Spectrogram, to improve classification accuracy. This paper aims to use a Deep Neural Network (DNN) for recognizing emotion labels from Bangla spoken speech data and has achieved 99% accuracy on unseen data. Given the scarcity of Bangla SER datasets and to validate the model's robustness, we tested its performance on additional lingual datasets, namely RAVDESS, TESS, EMOVO, EmoDB, and BanglaSER. Remarkably, our model demonstrated high accuracy rates of 90%, 99%, 96%, 95%, and 93% on these datasets, respectively.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Cross-lingual Emotion Detection
    Hassan, Sabit
    Shaar, Shaden
    Darwish, Kareem
    2022 Language Resources and Evaluation Conference, LREC 2022, 2022, : 6948 - 6958
  • [22] Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation
    Wang, Changhan
    Pino, Juan
    Gu, Jiatao
    INTERSPEECH 2020, 2020, : 4731 - 4735
  • [23] Cross-lingual Emotion Detection
    Hassan, Sabit
    Shaar, Shaden
    Darwish, Kareem
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6948 - 6958
  • [24] Optimal trained ensemble of classification model for speech emotion recognition: Considering cross-lingual and multi-lingual scenarios
    Kawade, Rupali Ramdas
    Jagtap, Sonal K.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (18) : 54331 - 54365
  • [25] Optimal trained ensemble of classification model for speech emotion recognition: Considering cross-lingual and multi-lingual scenarios
    Rupali Ramdas Kawade
    Sonal K. Jagtap
    Multimedia Tools and Applications, 2024, 83 : 54331 - 54365
  • [26] Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition
    Cahyawijaya, Samuel
    Lovenia, Holy
    Chung, Willy
    Frieske, Rita
    Liu, Zihan
    Fung, Pascale
    INTERSPEECH 2023, 2023, : 3352 - 3356
  • [27] SPEECH EMOTION RECOGNITION-A DEEP LEARNING APPROACH
    Asiya, U. A.
    Kiran, V. K.
    PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 867 - 871
  • [28] Speech Recognition for Turkic Languages Using Cross-Lingual Transfer Learning from Kazakh
    Orel, Daniil
    Yeshpanov, Rustem
    Varol, Huseyin Atakan
    2023 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, BIGCOMP, 2023, : 174 - 182
  • [29] Speech Recognition for Turkic Languages Using Cross-Lingual Transfer Learning from Kazakh
    Orel, Daniil
    Yeshpanov, Rustem
    Varol, Huseyin Atakan
    Proceedings - 2023 IEEE International Conference on Big Data and Smart Computing, BigComp 2023, 2023, : 174 - 182
  • [30] CROSS-LINGUAL SPEECH RECOGNITION UNDER RUNTIME RESOURCE CONSTRAINTS
    Yu, Dong
    Deng, Li
    Liu, Peng
    Wu, Jian
    Gong, Yifan
    Acero, Alex
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4193 - 4196