Hierarchical convolutional neural networks with post-attention for speech emotion recognition

被引：0

作者：

Fan, Yonghong ^{[1
,2
,3
]}

Huang, Heming ^{[1
,2
,3
]}

Han, Henry ^{[4
]}

机构：

[1] Qinghai Normal Univ, Sch Comp Sci, Xining 810008, Peoples R China

[2] State Key Lab Tibetan Intelligent Informat Proc &, Xining 810008, Peoples R China

[3] Minist Educ, Key Lab Tibetan Informat Proc, Xining 810008, Peoples R China

[4] Baylor Univ, Sch Engn & Comp Sci, Dept Comp Sci, Lab Data Sci & Artificial Intelligence Innovat, Waco, TX 76789 USA

来源：

NEUROCOMPUTING | 2025年 / 615卷

基金：

中国国家自然科学基金;

关键词：

Speech emotion recognition hc-former; Spatiotemporal information; Long-term dependence; Class-discriminative features; CNN;

D O I：

10.1016/j.neucom.2024.128879

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech emotion recognition (SER) is a key prerequisite for natural human-computer interaction. However, existing SER systems still face great challenges, particularly in the extraction of discriminative and high-quality emotional features. To address this challenge, this study proposes hc-former, a hierarchical convolutional neural network (CNN) with post-attention. Unlike traditional CNNs and recurrent neural networks (RNNs), our model adeptly extracts potent class-discriminative features that integrate spatiotemporal information and longterm dependence. The class-discriminative features extracted by hc-former, which emphasize both interclass separation and intraclass compactness, can more effectively represent different class emotions often confused with one another, leading to superior classification results. Our experimental results further indicate the exceptional performance of hc-former for SER on benchmark datasets, surpassing other peer models in terms of performance while utilizing fewer parameters.

引用

页数：15

共 50 条

[1] Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism
Mountzouris, Konstantinos
Perikos, Isidoros
Hatzilygeroudis, Ioannis
Corchado, Juan M.
Iglesias, Carlos A.
Kim, Byung-Gyu
Mehmood, Rashid
Ren, Fuji
Lee, In
ELECTRONICS, 2023, 12 (20)
[2] Multiple attention convolutional-recurrent neural networks for speech emotion recognition
Zhang, Zhihao
Wang, Kunxia
2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2022,
[3] Continuous Speech Emotion Recognition with Convolutional Neural Networks
Vryzas, Nikolaos
Vrysis, Lazaros
Matsiola, Maria
Kotsakis, Rigas
Dimoulas, Charalampos
Kalliris, George
JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2020, 68 (1-2): : 14 - 24
[4] Continuous speech emotion recognition with convolutional neural networks
Vryzas, Nikolaos
Vrysis, Lazaros
Matsiola, Maria
Kotsakis, Rigas
Dimoulas, Charalampos
Kalliris, George
AES: Journal of the Audio Engineering Society, 2020, 68 (1-2): : 14 - 24
[5] Convolutional-Recurrent Neural Networks With Multiple Attention Mechanisms for Speech Emotion Recognition
Jiang, Pengxu
Xu, Xinzhou
Tao, Huawei
Zhao, Li
Zou, Cairong
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (04) : 1564 - 1573
[6] Speech Emotion Recognition Using Convolutional-Recurrent Neural Networks with Attention Model
Mu, Yawei
Gomez, Hernandez
Cano Montes, Antonio
Alcaraz Martinez, Carlos
Wang, Xuetian
Gao, Hongmin
2ND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE AND INTERNET TECHNOLOGY, CII 2017, 2017, : 341 - 350
[7] Speech emotion recognition with deep convolutional neural networks
Issa, Dias
Demirci, M. Fatih
Yazici, Adnan
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
[8] 3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition
Chen, Mingyi
He, Xuanji
Yang, Jing
Zhang, Han
IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (10) : 1440 - 1444
[9] IMPROVING CONVOLUTIONAL RECURRENT NEURAL NETWORKS FOR SPEECH EMOTION RECOGNITION
Meyer, Patrick
Xu, Ziyi
Fingscheidt, Tim
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 365 - 372
[10] Gender Differentiated Convolutional Neural Networks for Speech Emotion Recognition
Mishra, Puneet
Sharma, Ruchir
2020 12TH INTERNATIONAL CONGRESS ON ULTRA MODERN TELECOMMUNICATIONS AND CONTROL SYSTEMS AND WORKSHOPS (ICUMT 2020), 2020, : 142 - 148

← 1 2 3 4 5 →