Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network

被引:0
|
作者
Bhangale, Kishor [1 ]
Kothandaraman, Mohanaprasad [1 ]
机构
[1] VIT, SENSE, Chennai, India
关键词
Data augmentation; Deep learning; Deep convolutional neural network; Generative adversarial network; Multi-taper Mel frequency spectrogram; Speech processing; Speech emotion recognition; FEATURES; CLASSIFIERS;
D O I
10.1007/s00034-023-02562-5
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech emotion recognition (SER) has recently increased because of vast innovations in human-computer interaction and affective computing. In recent years, numerous deep learning-based schemes presented for SER have shown significant improvement over the traditional machine learning approaches. Most deep learning-based faced SER systems face challenges due to data imbalance problem that occurs due to unequal samples in the database. The input to two-dimensional CNN uses traditional MFCC for SER. It degrades the quality of deep attributes because of the higher variance, frequency resolution problem and spectral leakage problem of traditional MFCC. This paper proposed the novel Multi-taper Mel Frequency Logarithmic Spectrogram to enrich the Deep Convolutional Neural Network effectiveness for SER. Further, Generative Adversarial Network is used for speech emotion data augmentation during training to deal with data scarcity problems in SER. The performance of the proposed SER scheme is validated using the Berlin EmoDB and RAVDESS datasets. The proposed method provides SER accuracy of 96.65% and 97.12% for the EmoDB and RAVDESS dataset, respectively, and significantly improves over the recent techniques.
引用
收藏
页码:2341 / 2384
页数:44
相关论文
共 50 条
  • [21] Modulation recognition method based on generative adversarial and convolutional neural network
    Shao, Kai
    Zhu, Miaomiao
    Wang, Guangyu
    [J]. Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2022, 44 (03): : 1036 - 1043
  • [22] Data Augmentation for Imbalanced HRRP Recognition Using Deep Convolutional Generative Adversarial Network
    Song, Yiheng
    Li, Yang
    Wang, Yanhua
    Hu, Cheng
    [J]. IEEE ACCESS, 2020, 8 : 201686 - 201695
  • [23] Imbalanced data fault diagnosis of hydrogen sensors using deep convolutional generative adversarial network with convolutional neural network
    Sun, Yongyi
    Zhao, Tingting
    Zou, Zhihui
    Chen, Yinsheng
    Zhang, Hongquan
    [J]. REVIEW OF SCIENTIFIC INSTRUMENTS, 2021, 92 (09):
  • [24] Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition
    Falahzadeh, Mohammad Reza
    Farokhi, Fardad
    Harimi, Ali
    Sabbaghi-Nadooshan, Reza
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (01) : 449 - 492
  • [25] DEEP CONVOLUTIONAL RECURRENT NEURAL NETWORK WITH ATTENTION MECHANISM FOR ROBUST SPEECH EMOTION RECOGNITION
    Huang, Che-Wei
    Narayanan, Shrikanth
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 583 - 588
  • [26] Speech Emotion Recognition Based on Multiple Acoustic Features and Deep Convolutional Neural Network
    Bhangale, Kishor
    Kothandaraman, Mohanaprasad
    [J]. ELECTRONICS, 2023, 12 (04)
  • [27] Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition
    Mohammad Reza Falahzadeh
    Fardad Farokhi
    Ali Harimi
    Reza Sabbaghi-Nadooshan
    [J]. Circuits, Systems, and Signal Processing, 2023, 42 : 449 - 492
  • [28] Speech Emotion Recognition Based on Deep Neural Network
    Zhu, Zijiang
    Hu, Yi
    Li, Junshan
    Li, Jianjun
    Wang, Junhua
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 126 : 154 - 154
  • [29] Active Learning for Speech Emotion Recognition Using Deep Neural Network
    Abdelwahab, Mohammed
    Busso, Carlos
    [J]. 2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
  • [30] Multimodal speech emotion recognition and classification using convolutional neural network techniques
    A. Christy
    S. Vaithyasubramanian
    A. Jesudoss
    M. D. Anto Praveena
    [J]. International Journal of Speech Technology, 2020, 23 : 381 - 388