Convolutional-Recurrent Neural Networks With Multiple Attention Mechanisms for Speech Emotion Recognition

被引：10

作者：

Jiang, Pengxu ^{[1
]}

Xu, Xinzhou ^{[2
]}

Tao, Huawei ^{[3
]}

Zhao, Li ^{[1
]}

Zou, Cairong ^{[1
]}

机构：

[1] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China

[2] Nanjing Univ Posts & Telecommun, Sch Internet Things, Nanjing 210023, Peoples R China

[3] Henan Univ Technol, Coll Informat Sci & Technol, Zhengzhou 450001, Peoples R China

来源：

IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS | 2022年 / 14卷 / 04期

关键词：

Convolutional neural networks (CNNs); long short-term memory (LSTM); multiple attention mechanisms; speech emotion recognition (SER); FEATURES; REPRESENTATIONS; MODEL;

D O I：

10.1109/TCDS.2021.3123979

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech emotion recognition (SER) aims to endow machines with the intelligence in perceiving latent affective components from speech. However, the existing works on deep-learning-based SER make it difficult to jointly consider time-frequency and sequential information in speech due to their structures, which may lead to deficiencies in exploring reasonable local emotional representations. In this regard, we propose a convolutional-recurrent neural network with multiple attention mechanisms (CRNN-MAs) for SER in this article, including the paralleled convolutional neural network (CNN) and long short-term memory (LSTM) modules, using extracted Mel-spectrums and frame-level features, respectively, in order to acquire time-frequency and sequential information simultaneously. Furthermore, we set three strategies for the proposed CRNN-MA: 1) a multiple self-attention layer in the CNN module on frame-level weights; 2) a multidimensional attention layer as the input features of the LSTM; and 3) a fusion layer summarizing the features of the two modules. Experimental results on three conventional SER corpora demonstrate the effectiveness of the proposed approach through using the convolutional-recurrent and multiple-attention modules, compared with other related models and existing state-of-the-art approaches.

引用

页码：1564 / 1573

页数：10

共 50 条

[21] FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition
Dossou, Bonaventure F. P.
Gbenou, Yeno K. S.
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3526 - 3531
[22] Multiple Convolutional Neural Networks in EEG Emotion Recognition
Khairunissa, Hana Dwi
Djamal, Esmeralda Contessa
Wulandari, Arlisa
[J]. 2021 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATICS ENGINEERING (IC2IE 2021), 2021, : 30 - 35
[23] Ensemble Learning With Attention-Integrated Convolutional Recurrent Neural Network for Imbalanced Speech Emotion Recognition
Ai, Xusheng
Sheng, Victor S.
Fang, Wei
Ling, Charles X.
Li, Chunhua
[J]. IEEE ACCESS, 2020, 8 : 199909 - 199919
[24] Multi-Channel 2-D Convolutional Recurrent Neural Networks for Speech Emotion Recognition
Zhou, Weidong
Zhou, Houpan
Xia, Pengfei
[J]. 2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 5884 - 5889
[25] Multiview Feature Fusion Attention Convolutional Recurrent Neural Networks for EEG-Based Emotion Recognition
Xin, Ruihao
Miao, Fengbo
Cong, Ping
Zhang, Fan
Xin, Yongxian
Feng, Xin
[J]. JOURNAL OF SENSORS, 2023, 2023
[26] Multimodal Classification with Deep Convolutional-Recurrent Neural Networks for Electroencephalography
Tan, Chuanqi
Sun, Fuchun
Zhang, Wenchang
Chen, Jianhua
Liu, Chunfang
[J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT II, 2017, 10635 : 767 - 776
[27] Deep Convolutional Neural Networks for Feature Extraction in Speech Emotion Recognition
Heracleous, Panikos
Mohammad, Yasser
Yoneyama, Akio
[J]. HUMAN-COMPUTER INTERACTION. RECOGNITION AND INTERACTION TECHNOLOGIES, HCI 2019, PT II, 2019, 11567 : 117 - 132
[28] Improvement on Speech Emotion Recognition Based on Deep Convolutional Neural Networks
Niu, Yafeng
Zou, Dongsheng
Niu, Yadong
He, Zhongshi
Tan, Hua
[J]. PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON COMPUTING AND ARTIFICIAL INTELLIGENCE (ICCAI 2018), 2018, : 13 - 18
[29] Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition
Jiang, Pengxu
Fu, Hongliang
Tao, Huawei
Lei, Peizhi
Zhao, Li
[J]. IEEE ACCESS, 2019, 7 : 90368 - 90377
[30] Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data
Lee, Chan Woo
Song, Kyu Ye
Jeong, Jihoon
Choi, Woo Yong
[J]. FIRST GRAND CHALLENGE AND WORKSHOP ON HUMAN MULTIMODAL LANGUAGE (CHALLENGE-HML), 2018, : 28 - 34

← 1 2 3 4 5 →