Advancements in Bangla Speech Emotion Recognition: A Deep Learning Approach with Cross-Lingual Validation

被引:0
|
作者
Alam, Khorshed [1 ]
Bhuiyan, Mahbubul Haq [1 ]
Hossain, Md Junayed [1 ]
Monir, Md Fahad [1 ]
Bin Khaled, Md Asif [1 ]
机构
[1] Independent Univ Bangladesh IUB, Dept Comp Sci & Engn, Dhaka, Bangladesh
关键词
Bangla Speech Emotion Recognition (SER); Deep Neural Networks (DNN); Feature Extraction; Cross-Lingual Validation; Emotion Recognition; Data Augmentation; Spectrogram Analysis; Speech Processing;
D O I
10.1109/VTC2024-SPRING62846.2024.10683404
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech Emotion Recognition (SER) is a method where computers learn to recognize human emotions from speech to improve communication. In this study, we present an innovative Bangla SER framework, incorporating data augmentations, feature extractions, and a deep learning model. We utilize a publicly available Bangla SER dataset named SUBESCO, developed by Shahjalal University of Science and Technology (SUST), which stands as one of the largest datasets in the Bangla SER domain. To enhance the data's amount and variety, we use noise injection, time stretching, time shifting, and pitch augmentation. Previous Bangla SER models have struggled with severe misclassification in lower tonal classes, highlighting the need for more robust feature extraction. Therefore, we have conducted feature extraction, including Zero Crossing Rate (ZCR), Chroma stft, Mel-frequency Cepstrum Coefficient (MFCC), Root Mean Square, and Mel Spectrogram, to improve classification accuracy. This paper aims to use a Deep Neural Network (DNN) for recognizing emotion labels from Bangla spoken speech data and has achieved 99% accuracy on unseen data. Given the scarcity of Bangla SER datasets and to validate the model's robustness, we tested its performance on additional lingual datasets, namely RAVDESS, TESS, EMOVO, EmoDB, and BanglaSER. Remarkably, our model demonstrated high accuracy rates of 90%, 99%, 96%, 95%, and 93% on these datasets, respectively.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Cross-lingual Automatic Speech Recognition Exploiting Articulatory Features
    Zhan, Qingran
    Motlicek, Petr
    Du, Shixuan
    Shan, Yahui
    Ma, Sifan
    Xie, Xiang
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1912 - 1916
  • [32] Cross-Lingual Automatic Speech Recognition Using Tandem Features
    Lal, Partha
    King, Simon
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (12): : 2506 - 2515
  • [33] Cross-Lingual Acoustic modeling for Dialectal Arabic Speech Recognition
    Elmahdy, Mohamed
    Gruhn, Rainer
    Minker, Wolfgang
    Abdennadher, Slim
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 873 - +
  • [34] A Preliminary Study of Cross-lingual Emotion Recognition from Speech: Automatic Classification versus Human Perception
    Jeon, Je Hun
    Le, Duc
    Xia, Rui
    Liu, Yang
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2836 - 2839
  • [35] Cross-lingual Speech Emotion Recognition System Based on a Three-Layer Model for Human Perception
    Elbarougy, Reda
    Akagi, Masato
    2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [36] Contribution of modulation spectral features for cross-lingual speech emotion recognition under noisy reverberant conditions
    Guo, Taiyang
    Li, Sixia
    Kidani, Shunsuke
    Okada, Shogo
    Unoki, Masashi
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 2221 - 2227
  • [37] Reliability of electric vehicle charging infrastructure: A cross-lingual deep learning approach
    Liu, Yifan
    Francis, Azell
    Hollauer, Catharina
    Lawson, M. Cade
    Shaikh, Omar
    Cotsman, Ashley
    Bhardwaj, Khushi
    Banboukian, Aline
    Li, Mimi
    Webb, Anne
    Asensio, Omar Isaac
    COMMUNICATIONS IN TRANSPORTATION RESEARCH, 2023, 3
  • [38] Towards Cross-Lingual Emotion Transplantation
    Lorenzo-Trueba, Jaime
    Barra-Chicote, Roberto
    Yamagishi, Junichi
    Montero, Juan M.
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2014, 2014, 8854 : 199 - 208
  • [39] Towards cross-lingual emotion transplantation
    1600, Springer Verlag (8854):
  • [40] A cross-lingual adaptation approach for rapid development of speech recognizers for learning disabled users
    Bohac, Marek
    Kucharova, Michaela
    Callejas, Zoraida
    Nouza, Jan
    Cerva, Petr
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014, : 1 - 13