Speech-based emotion recognition using a hybrid RNN-CNN network

被引:0
|
作者
Ning, Jingtao [1 ]
Zhang, Wenchuan [1 ]
机构
[1] Lanzhou Petrochem Univ Vocat Technol, Coll Informat Engn, Lanzhou 730060, Gansu, Peoples R China
关键词
Speech emotion recognition; Deep learning; Recurrent neural network; Convolutional neural network; Wide kernel; Classification; DEEP;
D O I
10.1007/s11760-024-03574-7
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech emotion recognition is probably among the most exciting and dynamic areas of modern research focused on speech signals analysis, which allows estimating and classifying speakers' rich spectrum of emotions. The following paper aims to develop a novel deep learning (DL)-based model for detecting speech emotion variation to overcome several weaknesses of the existing intelligent data-driven approaches. A new architecture for a DL network, referred to as the RNN-CNN, is proposed and applied in this paper to perform the SER task by operating directly on raw speech signals. Specifically, the challenge was effectively combining an initial convolution layer with a wide kernel as an efficient way to address and mitigate the problems caused by noise found in raw speech signals. In this experimental analysis, the 3 databases used to evaluate the proposed RNN-CNN model are RML, RAVDESS, and SAVEE. The effectiveness of such methodologies can be detected with remarkable efficacy, whose improved accuracy rates depict contrasting trends from those findings of the previous works analyzed through respective datasets. This assessment has validated the robust performance and applicability of the suggested models for diverse speech databases and underlined their potential in further speech-based emotion recognition.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Survey on Machine Learning in Speech Emotion Recognition and Vision Systems Using a Recurrent Neural Network (RNN)
    Yadav, Satya Prakash
    Zaidi, Subiya
    Mishra, Annu
    Yadav, Vibhash
    ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING, 2022, 29 (03) : 1753 - 1770
  • [32] 1D-CNN: Speech Emotion Recognition System Using a Stacked Network with Dilated CNN Features
    Mustaqeem
    Kwon, Soonil
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 67 (03): : 4039 - 4059
  • [33] Speech-based Emotion Characterization using Postures and Gestures in CVEs
    Amarakeerthi, Senaka
    Ranaweera, Rasika
    Cohen, Michael
    2010 INTERNATIONAL CONFERENCE ON CYBERWORLDS (CW 2010), 2010, : 72 - 76
  • [34] RNN with Improved Temporal Modeling for Speech Emotion Recognition
    Lieskovska, Eva
    Jakubec, Maros
    Jarina, Roman
    2022 32ND INTERNATIONAL CONFERENCE RADIOELEKTRONIKA (RADIOELEKTRONIKA), 2022, : 5 - 9
  • [35] Learning Salient Features for Speech Emotion Recognition Using CNN
    Liu, Jiamu
    Han, Wenjing
    Ruan, Huabin
    Chen, Xiaomin
    Jiang, Dongmei
    Li, Haifeng
    2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [36] Comparative Analysis of Windows for Speech Emotion Recognition Using CNN
    Teixeira, Felipe L.
    Soares, Salviano Pinto
    Abreu, J. L. Pio
    Oliveira, Paulo M.
    Teixeira, Joao P.
    OPTIMIZATION, LEARNING ALGORITHMS AND APPLICATIONS, PT I, OL2A 2023, 2024, 1981 : 233 - 248
  • [37] Speech Emotion Recognition using XGBoost and CNN BLSTM with Attention
    He, Jingru
    Ren, Liyong
    2021 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI 2021), 2021, : 154 - 159
  • [38] Contrastive Learning with Multi-level Embeddings for Speech-Based Emotion Recognition
    Si, Mei
    HCI INTERNATIONAL 2024-LATE BREAKING POSTERS, HCII 2024, PT I, 2025, 2319 : 312 - 321
  • [39] Emotion Recognition from Facial Expression Using Hybrid CNN-LSTM Network
    Mohana, M.
    Subashini, P.
    Krishnaveni, M.
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023, 37 (08)
  • [40] Could speaker, gender or age awareness be beneficial in speech-based emotion recognition?
    Sidorov, Maxim
    Schmitt, Alexander
    Semenkin, Eugene
    Minker, Wolfgang
    Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, 2016, : 61 - 68