Hierarchical convolutional neural networks with post-attention for speech emotion recognition

被引:0
|
作者
Fan, Yonghong [1 ,2 ,3 ]
Huang, Heming [1 ,2 ,3 ]
Han, Henry [4 ]
机构
[1] Qinghai Normal Univ, Sch Comp Sci, Xining 810008, Peoples R China
[2] State Key Lab Tibetan Intelligent Informat Proc &, Xining 810008, Peoples R China
[3] Minist Educ, Key Lab Tibetan Informat Proc, Xining 810008, Peoples R China
[4] Baylor Univ, Sch Engn & Comp Sci, Dept Comp Sci, Lab Data Sci & Artificial Intelligence Innovat, Waco, TX 76789 USA
基金
中国国家自然科学基金;
关键词
Speech emotion recognition hc-former; Spatiotemporal information; Long-term dependence; Class-discriminative features; CNN;
D O I
10.1016/j.neucom.2024.128879
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech emotion recognition (SER) is a key prerequisite for natural human-computer interaction. However, existing SER systems still face great challenges, particularly in the extraction of discriminative and high-quality emotional features. To address this challenge, this study proposes hc-former, a hierarchical convolutional neural network (CNN) with post-attention. Unlike traditional CNNs and recurrent neural networks (RNNs), our model adeptly extracts potent class-discriminative features that integrate spatiotemporal information and longterm dependence. The class-discriminative features extracted by hc-former, which emphasize both interclass separation and intraclass compactness, can more effectively represent different class emotions often confused with one another, leading to superior classification results. Our experimental results further indicate the exceptional performance of hc-former for SER on benchmark datasets, surpassing other peer models in terms of performance while utilizing fewer parameters.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism
    Mountzouris, Konstantinos
    Perikos, Isidoros
    Hatzilygeroudis, Ioannis
    Corchado, Juan M.
    Iglesias, Carlos A.
    Kim, Byung-Gyu
    Mehmood, Rashid
    Ren, Fuji
    Lee, In
    ELECTRONICS, 2023, 12 (20)
  • [2] Multiple attention convolutional-recurrent neural networks for speech emotion recognition
    Zhang, Zhihao
    Wang, Kunxia
    2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2022,
  • [3] Continuous Speech Emotion Recognition with Convolutional Neural Networks
    Vryzas, Nikolaos
    Vrysis, Lazaros
    Matsiola, Maria
    Kotsakis, Rigas
    Dimoulas, Charalampos
    Kalliris, George
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2020, 68 (1-2): : 14 - 24
  • [4] Continuous speech emotion recognition with convolutional neural networks
    Vryzas, Nikolaos
    Vrysis, Lazaros
    Matsiola, Maria
    Kotsakis, Rigas
    Dimoulas, Charalampos
    Kalliris, George
    AES: Journal of the Audio Engineering Society, 2020, 68 (1-2): : 14 - 24
  • [5] Convolutional-Recurrent Neural Networks With Multiple Attention Mechanisms for Speech Emotion Recognition
    Jiang, Pengxu
    Xu, Xinzhou
    Tao, Huawei
    Zhao, Li
    Zou, Cairong
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (04) : 1564 - 1573
  • [6] Speech Emotion Recognition Using Convolutional-Recurrent Neural Networks with Attention Model
    Mu, Yawei
    Gomez, Hernandez
    Cano Montes, Antonio
    Alcaraz Martinez, Carlos
    Wang, Xuetian
    Gao, Hongmin
    2ND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE AND INTERNET TECHNOLOGY, CII 2017, 2017, : 341 - 350
  • [7] Speech emotion recognition with deep convolutional neural networks
    Issa, Dias
    Demirci, M. Fatih
    Yazici, Adnan
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
  • [8] 3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition
    Chen, Mingyi
    He, Xuanji
    Yang, Jing
    Zhang, Han
    IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (10) : 1440 - 1444
  • [9] IMPROVING CONVOLUTIONAL RECURRENT NEURAL NETWORKS FOR SPEECH EMOTION RECOGNITION
    Meyer, Patrick
    Xu, Ziyi
    Fingscheidt, Tim
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 365 - 372
  • [10] Gender Differentiated Convolutional Neural Networks for Speech Emotion Recognition
    Mishra, Puneet
    Sharma, Ruchir
    2020 12TH INTERNATIONAL CONGRESS ON ULTRA MODERN TELECOMMUNICATIONS AND CONTROL SYSTEMS AND WORKSHOPS (ICUMT 2020), 2020, : 142 - 148