EMOHRNET: HIGH-RESOLUTION NEURAL NETWORK BASED SPEECH EMOTION RECOGNITION

被引:0
|
作者
Muppidi, Akshay [1 ]
Radfar, Martin [1 ]
机构
[1] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY 11794 USA
关键词
Speech emotion recognition; High Resolution Network; Frequency Masking; Time Masking;
D O I
10.1109/ICASSP48485.2024.10446976
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech emotion recognition (SER) is pivotal for enhancing human-machine interactions. This paper introduces "EmoHRNet", a novel adaptation of High-Resolution Networks (HRNet) tailored for SER. The HRNet structure is designed to maintain high-resolution representations from the initial to the final layers. By transforming audio samples into spectrograms, EmoHRNet leverages the HRNet architecture to extract high-level features. EmoHRNet's unique architecture maintains high-resolution representations throughout, capturing both granular and overarching emotional cues from speech signals. The model outperforms leading models, achieving accuracies of 92.45% on RAVDESS, 80.06% on IEMOCAP, and 92.77% on EMOVO. Thus, we show that EmoHRNet sets a new benchmark in the SER domain.
引用
收藏
页码:10881 / 10885
页数:5
相关论文
共 50 条
  • [21] Neural network-based blended ensemble learning for speech emotion recognition
    Yalamanchili, Bhanusree
    Samayamantula, Srinivas Kumar
    Anne, Koteswara Rao
    MULTIDIMENSIONAL SYSTEMS AND SIGNAL PROCESSING, 2022, 33 (04) : 1323 - 1348
  • [22] Research on Speech Emotion Recognition Technology based on Deep and Shallow Neural Network
    Wang, Jian
    Han, Zhiyan
    PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 3555 - 3558
  • [23] Neural network-based blended ensemble learning for speech emotion recognition
    Bhanusree Yalamanchili
    Srinivas Kumar Samayamantula
    Koteswara Rao Anne
    Multidimensional Systems and Signal Processing, 2022, 33 : 1323 - 1348
  • [24] Speech Emotion Recognition of Merged Features Based on Improved Convolutional Neural Network
    Peng, Wangyue
    Tang, Xiaoyu
    2019 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP), 2019, : 301 - 305
  • [25] Speech Emotion Recognition Based on Convolution Neural Network combined with Random Forest
    Zheng, Li
    Li, Qiao
    Ban, Hua
    Liu, Shuhua
    PROCEEDINGS OF THE 30TH CHINESE CONTROL AND DECISION CONFERENCE (2018 CCDC), 2018, : 4143 - 4147
  • [26] Adaptive Artificial Neural Network Based Marathi Speech Database Emotion Recognition
    Palange, Lalita Anil
    Darekar, Raviraj Vishwambhar
    TECHNO-SOCIETAL 2018: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SOCIETAL APPLICATIONS - VOL 2, 2020, : 59 - 67
  • [27] Speech Emotion Recognition System Based on BP Neural Network in Matlab Environment
    Zhang, Guobao
    Song, Qinghua
    Fei, Shumin
    ADVANCES IN NEURAL NETWORKS - ISNN 2008, PT 2, PROCEEDINGS, 2008, 5264 : 801 - 808
  • [28] Multicriteria Neural Network Design in the Speech-based Emotion Recognition Problem
    Brester, Christina
    Semenkin, Eugene
    Sidorov, Maxim
    Semenkina, Olga
    ICIMCO 2015 PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL. 1, 2015, : 621 - 628
  • [29] Recurrent neural network for high-resolution radar ship target recognition
    Wang, FX
    Yu, WX
    Guo, GR
    ICR '96 - 1996 CIE INTERNATIONAL CONFERENCE OF RADAR, PROCEEDINGS, 1996, : 200 - 203
  • [30] A Study on Speech Emotion Recognition Using a Deep Neural Network
    Lee, Kyong Hee
    Choi, Hyun Kyun
    Jang, Byung Tae
    Kim, Do Hyun
    2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1162 - 1165