Hierarchical convolutional neural networks with post-attention for speech emotion recognition

被引:0
|
作者
Fan, Yonghong [1 ,2 ,3 ]
Huang, Heming [1 ,2 ,3 ]
Han, Henry [4 ]
机构
[1] Qinghai Normal Univ, Sch Comp Sci, Xining 810008, Peoples R China
[2] State Key Lab Tibetan Intelligent Informat Proc &, Xining 810008, Peoples R China
[3] Minist Educ, Key Lab Tibetan Informat Proc, Xining 810008, Peoples R China
[4] Baylor Univ, Sch Engn & Comp Sci, Dept Comp Sci, Lab Data Sci & Artificial Intelligence Innovat, Waco, TX 76789 USA
基金
中国国家自然科学基金;
关键词
Speech emotion recognition hc-former; Spatiotemporal information; Long-term dependence; Class-discriminative features; CNN;
D O I
10.1016/j.neucom.2024.128879
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech emotion recognition (SER) is a key prerequisite for natural human-computer interaction. However, existing SER systems still face great challenges, particularly in the extraction of discriminative and high-quality emotional features. To address this challenge, this study proposes hc-former, a hierarchical convolutional neural network (CNN) with post-attention. Unlike traditional CNNs and recurrent neural networks (RNNs), our model adeptly extracts potent class-discriminative features that integrate spatiotemporal information and longterm dependence. The class-discriminative features extracted by hc-former, which emphasize both interclass separation and intraclass compactness, can more effectively represent different class emotions often confused with one another, leading to superior classification results. Our experimental results further indicate the exceptional performance of hc-former for SER on benchmark datasets, surpassing other peer models in terms of performance while utilizing fewer parameters.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Emotion recognition in speech using neural networks
    Nicholson, J
    Takahashi, K
    Nakatsu, R
    NEURAL COMPUTING & APPLICATIONS, 2000, 9 (04): : 290 - 296
  • [42] Emotion Recognition in Speech Using Neural Networks
    J. Nicholson
    K. Takahashi
    R. Nakatsu
    Neural Computing & Applications, 2000, 9 : 290 - 296
  • [43] Hybrid data augmentation and deep attention-based dilated convolutional-recurrent neural networks for speech emotion recognition
    Pham, Nhat Truong
    Dang, Duc Ngoc Minh
    Nguyen, Ngoc Duy
    Nguyen, Thanh Thi
    Nguyen, Hai
    Manavalan, Balachandran
    Lim, Chee Peng
    Nguyen, Sy Dzung
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 230
  • [44] Continuous speech recognition by convolutional neural networks
    Zhang, Qing-Qing
    Liu, Yong
    Pan, Jie-Lin
    Yan, Yong-Hong
    Gongcheng Kexue Xuebao/Chinese Journal of Engineering, 2015, 37 (09): : 1212 - 1217
  • [45] Convolutional Neural Networks for Distant Speech Recognition
    Swietojanski, Pawel
    Ghoshal, Arnab
    Renals, Steve
    IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) : 1120 - 1124
  • [46] AN ANALYSIS OF CONVOLUTIONAL NEURAL NETWORKS FOR SPEECH RECOGNITION
    Huang, Jui-Ting
    Li, Jinyu
    Gong, Yifan
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4989 - 4993
  • [47] Speech Recognition Based on Convolutional Neural Networks
    Du Guiming
    Wang Xia
    Wang Guangyan
    Zhang Yan
    Li Dan
    2016 IEEE INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2016, : 708 - 711
  • [48] Speech emotion recognition using recurrent neural networks with directional self-attention
    Li, Dongdong
    Liu, Jinlin
    Yang, Zhuo
    Sun, Linyu
    Wang, Zhe
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 173
  • [49] SPEECH RECOGNITION WITH HIERARCHICAL RECURRENT NEURAL NETWORKS
    CHEN, WY
    LIAO, YF
    CHEN, SH
    PATTERN RECOGNITION, 1995, 28 (06) : 795 - 805
  • [50] Speech recognition with hierarchical recurrent neural networks
    Natl Chiao Tung Univ, Hsinchu, Taiwan
    Pattern Recognit, 6 (795-805):