Speech-based emotion recognition using a hybrid RNN-CNN network

被引:0
|
作者
Ning, Jingtao [1 ]
Zhang, Wenchuan [1 ]
机构
[1] Lanzhou Petrochem Univ Vocat Technol, Coll Informat Engn, Lanzhou 730060, Gansu, Peoples R China
关键词
Speech emotion recognition; Deep learning; Recurrent neural network; Convolutional neural network; Wide kernel; Classification; DEEP;
D O I
10.1007/s11760-024-03574-7
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech emotion recognition is probably among the most exciting and dynamic areas of modern research focused on speech signals analysis, which allows estimating and classifying speakers' rich spectrum of emotions. The following paper aims to develop a novel deep learning (DL)-based model for detecting speech emotion variation to overcome several weaknesses of the existing intelligent data-driven approaches. A new architecture for a DL network, referred to as the RNN-CNN, is proposed and applied in this paper to perform the SER task by operating directly on raw speech signals. Specifically, the challenge was effectively combining an initial convolution layer with a wide kernel as an efficient way to address and mitigate the problems caused by noise found in raw speech signals. In this experimental analysis, the 3 databases used to evaluate the proposed RNN-CNN model are RML, RAVDESS, and SAVEE. The effectiveness of such methodologies can be detected with remarkable efficacy, whose improved accuracy rates depict contrasting trends from those findings of the previous works analyzed through respective datasets. This assessment has validated the robust performance and applicability of the suggested models for diverse speech databases and underlined their potential in further speech-based emotion recognition.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] A Hybrid RNN-CNN Encoder for Neural Conversation Model
    Ma, Zhiyuan
    Rong, Wenge
    Wang, Yanmeng
    Shi, Libin
    Xiong, Zhang
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2018, PT II, 2018, 11062 : 159 - 170
  • [2] Emotion recognition of speech based on RNN
    Park, CH
    Lee, DW
    Sim, KB
    2002 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-4, PROCEEDINGS, 2002, : 2210 - 2213
  • [3] Multicriteria Neural Network Design in the Speech-based Emotion Recognition Problem
    Brester, Christina
    Semenkin, Eugene
    Sidorov, Maxim
    Semenkina, Olga
    ICIMCO 2015 PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL. 1, 2015, : 621 - 628
  • [4] Effect of Reverberation in Speech-based Emotion Recognition
    Zhao, Shujie
    Yang, Yan
    Chen, Jingdong
    2018 IEEE INTERNATIONAL CONFERENCE ON THE SCIENCE OF ELECTRICAL ENGINEERING IN ISRAEL (ICSEE), 2018,
  • [5] An investigation of speech-based human emotion recognition
    Wang, YJ
    Guan, L
    2004 IEEE 6TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2004, : 15 - 18
  • [6] Towards Robust Speech-Based Emotion Recognition
    Tabatabaei, Talieh S.
    Krishnan, Sridhar
    2010 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010), 2010,
  • [7] Hyperspectral Image Classification Using a Hybrid RNN-CNN with Enhanced Attention Mechanisms
    Gunduz, Ali
    Orman, Zeynep
    JOURNAL OF THE INDIAN SOCIETY OF REMOTE SENSING, 2025, 53 (02) : 613 - 629
  • [8] Speech Emotion Recognition Using CNN
    Huang, Zhengwei
    Dong, Ming
    Mao, Qirong
    Zhan, Yongzhao
    PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 801 - 804
  • [9] Speech-based Emotion Recognition and Next Reaction Prediction
    Noroozi, Fatemeh
    Akrami, Neda
    Anbarjafari, Gholamreza
    2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [10] The Application of Capsule Neural Network Based CNN for Speech Emotion Recognition
    Wen, Xin-Cheng
    Liu, Kun-Hong
    Zhang, Wei-Ming
    Jiang, Kai
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9356 - 9362