Advancements in Bangla Speech Emotion Recognition: A Deep Learning Approach with Cross-Lingual Validation

被引:0
|
作者
Alam, Khorshed [1 ]
Bhuiyan, Mahbubul Haq [1 ]
Hossain, Md Junayed [1 ]
Monir, Md Fahad [1 ]
Bin Khaled, Md Asif [1 ]
机构
[1] Independent Univ Bangladesh IUB, Dept Comp Sci & Engn, Dhaka, Bangladesh
关键词
Bangla Speech Emotion Recognition (SER); Deep Neural Networks (DNN); Feature Extraction; Cross-Lingual Validation; Emotion Recognition; Data Augmentation; Spectrogram Analysis; Speech Processing;
D O I
10.1109/VTC2024-SPRING62846.2024.10683404
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech Emotion Recognition (SER) is a method where computers learn to recognize human emotions from speech to improve communication. In this study, we present an innovative Bangla SER framework, incorporating data augmentations, feature extractions, and a deep learning model. We utilize a publicly available Bangla SER dataset named SUBESCO, developed by Shahjalal University of Science and Technology (SUST), which stands as one of the largest datasets in the Bangla SER domain. To enhance the data's amount and variety, we use noise injection, time stretching, time shifting, and pitch augmentation. Previous Bangla SER models have struggled with severe misclassification in lower tonal classes, highlighting the need for more robust feature extraction. Therefore, we have conducted feature extraction, including Zero Crossing Rate (ZCR), Chroma stft, Mel-frequency Cepstrum Coefficient (MFCC), Root Mean Square, and Mel Spectrogram, to improve classification accuracy. This paper aims to use a Deep Neural Network (DNN) for recognizing emotion labels from Bangla spoken speech data and has achieved 99% accuracy on unseen data. Given the scarcity of Bangla SER datasets and to validate the model's robustness, we tested its performance on additional lingual datasets, namely RAVDESS, TESS, EMOVO, EmoDB, and BanglaSER. Remarkably, our model demonstrated high accuracy rates of 90%, 99%, 96%, 95%, and 93% on these datasets, respectively.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Cross-lingual Speaker Verification with Deep Feature Learning
    Li, Lantian
    Wang, Dong
    Rozi, Askar
    Zheng, Thomas Fang
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1040 - 1044
  • [42] A cross-lingual adaptation approach for rapid development of speech recognizers for learning disabled users
    Marek Bohac
    Michaela Kucharova
    Zoraida Callejas
    Jan Nouza
    Petr Červa
    EURASIP Journal on Audio, Speech, and Music Processing, 2014
  • [43] Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition
    Farooq, Muhammad Umar
    Hain, Thomas
    INTERSPEECH 2023, 2023, : 5072 - 5076
  • [44] Deep Learning Approach towards Emotion Recognition Based on Speech
    Butala, Padmanabh
    Pawar, Rajendra
    Jadhav, Nagesh
    Kalangan, Manas
    Dhumal, Aniket
    Kakad, Sahil
    JOURNAL OF ADVANCED APPLIED SCIENTIFIC RESEARCH, 2024, 6 (03): : 16 - 24
  • [45] Improving hate speech detection using Cross-Lingual Learning
    Firmino, Anderson Almeida
    Baptista, Claudio de Souza
    de Paiva, Anselmo Cardoso
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 235
  • [46] TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition
    Xue, Hongfei
    Shao, Qijie
    Chen, Peikun
    Guo, Pengcheng
    Xie, Lei
    Liu, Jie
    INTERSPEECH 2023, 2023, : 216 - 220
  • [47] Cross corpus multi-lingual speech emotion recognition using ensemble learning
    Zehra, Wisha
    Javed, Abdul Rehman
    Jalil, Zunera
    Khan, Habib Ullah
    Gadekallu, Thippa Reddy
    COMPLEX & INTELLIGENT SYSTEMS, 2021, 7 (04) : 1845 - 1854
  • [48] Cross corpus multi-lingual speech emotion recognition using ensemble learning
    Wisha Zehra
    Abdul Rehman Javed
    Zunera Jalil
    Habib Ullah Khan
    Thippa Reddy Gadekallu
    Complex & Intelligent Systems, 2021, 7 : 1845 - 1854
  • [49] Speech Emotion Recognition with Deep Learning
    Harar, Pavol
    Burget, Radim
    Dutta, Malay Kishore
    2017 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2017, : 137 - 140
  • [50] Cross-Lingual Language Modeling for Low-Resource Speech Recognition
    Xu, Ping
    Fung, Pascale
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (06): : 1134 - 1144