Hierarchical convolutional neural networks with post-attention for speech emotion recognition

被引:0
|
作者
Fan, Yonghong [1 ,2 ,3 ]
Huang, Heming [1 ,2 ,3 ]
Han, Henry [4 ]
机构
[1] Qinghai Normal Univ, Sch Comp Sci, Xining 810008, Peoples R China
[2] State Key Lab Tibetan Intelligent Informat Proc &, Xining 810008, Peoples R China
[3] Minist Educ, Key Lab Tibetan Informat Proc, Xining 810008, Peoples R China
[4] Baylor Univ, Sch Engn & Comp Sci, Dept Comp Sci, Lab Data Sci & Artificial Intelligence Innovat, Waco, TX 76789 USA
基金
中国国家自然科学基金;
关键词
Speech emotion recognition hc-former; Spatiotemporal information; Long-term dependence; Class-discriminative features; CNN;
D O I
10.1016/j.neucom.2024.128879
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech emotion recognition (SER) is a key prerequisite for natural human-computer interaction. However, existing SER systems still face great challenges, particularly in the extraction of discriminative and high-quality emotional features. To address this challenge, this study proposes hc-former, a hierarchical convolutional neural network (CNN) with post-attention. Unlike traditional CNNs and recurrent neural networks (RNNs), our model adeptly extracts potent class-discriminative features that integrate spatiotemporal information and longterm dependence. The class-discriminative features extracted by hc-former, which emphasize both interclass separation and intraclass compactness, can more effectively represent different class emotions often confused with one another, leading to superior classification results. Our experimental results further indicate the exceptional performance of hc-former for SER on benchmark datasets, surpassing other peer models in terms of performance while utilizing fewer parameters.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Speech Emotion Recognition using Convolution Neural Networks and Deep Stride Convolutional Neural Networks
    Wani, Taiba Majid
    Gunawan, Teddy Surya
    Qadri, Syed Asif Ahmad
    Mansor, Hasmah
    Kartiwi, Mira
    Ismail, Nanang
    PROCEEDING OF 2020 6TH INTERNATIONAL CONFERENCE ON WIRELESS AND TELEMATICS (ICWT), 2020,
  • [22] Graph Neural Network-Based Speech Emotion Recognition: A Fusion of Skip Graph Convolutional Networks and Graph Attention Networks
    Wang, Han
    Kim, Deok-Hwan
    ELECTRONICS, 2024, 13 (21)
  • [23] DEEP CONVOLUTIONAL RECURRENT NEURAL NETWORK WITH ATTENTION MECHANISM FOR ROBUST SPEECH EMOTION RECOGNITION
    Huang, Che-Wei
    Narayanan, Shrikanth
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 583 - 588
  • [24] Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks
    Mao, Qirong
    Dong, Ming
    Huang, Zhengwei
    Zhan, Yongzhao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (08) : 2203 - 2213
  • [25] COMPACT CONVOLUTIONAL RECURRENT NEURAL NETWORKS VIA BINARIZATION FOR SPEECH EMOTION RECOGNITION
    Zhao, Huan
    Xiao, Yufeng
    Han, Jing
    Zhang, Zixing
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6690 - 6694
  • [26] An Experimental Study of Speech Emotion Recognition Based on Deep Convolutional Neural Networks
    Zheng, W. Q.
    Yu, J. S.
    Zou, Y. X.
    2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 827 - 831
  • [27] Convolutional Neural Networks for Speech Recognition
    Abdel-Hamid, Ossama
    Mohamed, Abdel-Rahman
    Jiang, Hui
    Deng, Li
    Penn, Gerald
    Yu, Dong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1533 - 1545
  • [28] Combining Gated Convolutional Networks and Self-Attention Mechanism for Speech Emotion Recognition
    Li, Chao
    Jiao, Jinlong
    Zhao, Yiqin
    Zhao, Ziping
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2019, : 105 - 109
  • [29] Exploring Deep Spectrum Representations via Attention-Based Recurrent and Convolutional Neural Networks for Speech Emotion Recognition
    Zhao, Ziping
    Bao, Zhongtian
    Zhao, Yiqin
    Zhang, Zixing
    Cummins, Nicholas
    Ren, Zhao
    Schuller, Bjorn
    IEEE ACCESS, 2019, 7 : 97515 - 97525
  • [30] AUTOMATIC SPEECH EMOTION RECOGNITION USING RECURRENT NEURAL NETWORKS WITH LOCAL ATTENTION
    Mirsamadi, Seyedmahdad
    Barsoum, Emad
    Zhang, Cha
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2227 - 2231