Hierarchical convolutional neural networks with post-attention for speech emotion recognition

被引:0
|
作者
Fan, Yonghong [1 ,2 ,3 ]
Huang, Heming [1 ,2 ,3 ]
Han, Henry [4 ]
机构
[1] Qinghai Normal Univ, Sch Comp Sci, Xining 810008, Peoples R China
[2] State Key Lab Tibetan Intelligent Informat Proc &, Xining 810008, Peoples R China
[3] Minist Educ, Key Lab Tibetan Informat Proc, Xining 810008, Peoples R China
[4] Baylor Univ, Sch Engn & Comp Sci, Dept Comp Sci, Lab Data Sci & Artificial Intelligence Innovat, Waco, TX 76789 USA
基金
中国国家自然科学基金;
关键词
Speech emotion recognition hc-former; Spatiotemporal information; Long-term dependence; Class-discriminative features; CNN;
D O I
10.1016/j.neucom.2024.128879
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech emotion recognition (SER) is a key prerequisite for natural human-computer interaction. However, existing SER systems still face great challenges, particularly in the extraction of discriminative and high-quality emotional features. To address this challenge, this study proposes hc-former, a hierarchical convolutional neural network (CNN) with post-attention. Unlike traditional CNNs and recurrent neural networks (RNNs), our model adeptly extracts potent class-discriminative features that integrate spatiotemporal information and longterm dependence. The class-discriminative features extracted by hc-former, which emphasize both interclass separation and intraclass compactness, can more effectively represent different class emotions often confused with one another, leading to superior classification results. Our experimental results further indicate the exceptional performance of hc-former for SER on benchmark datasets, surpassing other peer models in terms of performance while utilizing fewer parameters.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Design of a Convolutional Neural Network for Speech Emotion Recognition
    Lee, Kyong Hee
    Kim, Do Hyun
    11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1332 - 1335
  • [32] CONVOLUTIONAL NEURAL NETWORK TECHNIQUES FOR SPEECH EMOTION RECOGNITION
    Parthasarathy, Srinivas
    Tashev, Ivan
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 121 - 125
  • [33] Emotion Recognition in Horses with Convolutional Neural Networks
    Corujo, Luis A.
    Kieson, Emily
    Schloesser, Timo
    Gloor, Peter A.
    FUTURE INTERNET, 2021, 13 (10):
  • [34] Attention Based Fully Convolutional Network for Speech Emotion Recognition
    Zhang, Yuanyuan
    Du, Jun
    Wang, Zirui
    Zhang, Jianshu
    Tu, Yanhui
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1771 - 1775
  • [35] Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks
    Ristea, Nicolae-Catalin
    Dutu, Liviu Cristian
    Radoi, Anamaria
    2019 10TH INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2019,
  • [36] Speech Emotion Recognition based on Multi-Level Residual Convolutional Neural Networks
    Zheng, Kai
    Xia, ZhiGuang
    Zhang, Yi
    Xu, Xuan
    Fu, Yaqin
    ENGINEERING LETTERS, 2020, 28 (02) : 559 - 565
  • [37] Speech Emotion Recognition and Deep Learning: An Extensive Validation Using Convolutional Neural Networks
    Ri, Francesco Ardan Dal
    Ciardi, Fabio Cifariello
    Conci, Nicola
    IEEE ACCESS, 2023, 11 : 116638 - 116649
  • [38] HIERARCHICAL ATTENTION-BASED TEMPORAL CONVOLUTIONAL NETWORKS FOR EEG-BASED EMOTION RECOGNITION
    Li, Chao
    Chen, Boyang
    Zhao, Ziping
    Cummins, Nicholas
    Schuller, Bjorn W.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 1240 - 1244
  • [39] Speech emotion recognition with embedded attention mechanism and hierarchical context
    Cheng Y.
    Chen Y.
    Chen Y.
    Yang Y.
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2019, 51 (11): : 100 - 107
  • [40] Emotion recognition in speech using neural networks
    Nicholson, J
    Takahashi, K
    Nakatsu, R
    AFFECTIVE MINDS, 2000, : 215 - 220