Siamese Attention-Based LSTM for Speech Emotion Recognition

被引：0

作者：

Nizamidin, Tashpolat ^{[1
]}

Zhao, Li ^{[1
]}

Liang, Ruiyu ^{[2
]}

Xie, Yue ^{[1
]}

Hamdulla, Askar ^{[3
]}

机构：

[1] Southeast Univ, Minist Educ, Key Lab Underwater Acoust Signal Proc, Nanjing 210096, Jiangsu, Peoples R China

[2] Nanjing Inst Technol, Sch Commun Engn, Nanjing 211167, Jiangsu, Peoples R China

[3] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi 830046, Peoples R China

来源：

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES | 2020年 / E103A卷 / 07期

关键词：

Siamese networks; pairwise training; attention-based long short-term memory; speech emotion recognition;

D O I：

10.1587/transfun.2019EAL2156

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

As one of the popular topics in the field of human-computer interaction, the Speech Emotion Recognition (SER) aims to classify the emotional tendency from the speakers' utterances. Using the existing deep learning methods, and with a large amount of training data, we can achieve a highly accurate performance result. Unfortunately, it's time consuming and difficult job to build such a huge emotional speech database that can be applicable universally. However, the Siamese Neural Network (SNN), which we discuss in this paper, can yield extremely precise results with just a limited amount of training data through pairwise training which mitigates the impacts of sample deficiency and provides enough iterations. To obtain enough SER training, this study proposes a novel method which uses Siamese Attention-based Long Short-Term Memory Networks. In this framework, we designed two Attention-based Long Short-Term Memory Networks which shares the same weights, and we input frame level acoustic emotional features to the Siamese network rather than utterance level emotional features. The proposed solution has been evaluated on EMODB, ABC and UYGSEDB corpora, and showed significant improvement on SER results, compared to conventional deep learning methods.

引用

页码：937 / 941

页数：5

共 50 条

[1] Attention-Based Dense LSTM for Speech Emotion Recognition
Xie, Yue
Liang, Ruiyu
Liang, Zhenlin
Zhao, Li
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (07): : 1426 - 1429
[2] Speech Emotion Classification Using Attention-Based LSTM
Xie, Yue
Liang, Ruiyu
Liang, Zhenlin
Huang, Chengwei
Zou, Cairong
Schuller, Bjoern
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1675 - 1685
[3] Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model
Atmaja, Bagus Tris
Akagi, Masato
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), 2019, : 40 - 44
[4] Attention-based Spatio-Temporal Graphic LSTM for EEG Emotion Recognition
Li, Xiaoxu
Zheng, Wenming
Zong, Yuan
Chang, Hongli
Lu, Cheng
[J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[5] Attention-based LSTM with Multi-task Learning for Distant Speech Recognition
Zhang, Yu
Zhang, Pengyuan
Yan, Yonghong
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3857 - 3861
[6] Attention-Based Models for Speech Recognition
Chorowski, Jan
Bahdanau, Dzmitry
Serdyuk, Dmitriy
Cho, Kyunghyun
Bengio, Yoshua
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[7] Upgraded Attention-Based Local Feature Learning Block for Speech Emotion Recognition
Zhao, Huan
Gao, Yingxue
Xiao, Yufeng
[J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT II, 2021, 12713 : 118 - 130
[8] A novel dual attention-based BLSTM with hybrid features in speech emotion recognition
Chen, Qiupu
Huang, Guimin
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 102
[9] A novel dual attention-based BLSTM with hybrid features in speech emotion recognition
Chen, Qiupu
Huang, Guimin
[J]. Engineering Applications of Artificial Intelligence, 2021, 102
[10] Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition
Zhao, Ziping
Zheng, Yu
Zhang, Zixing
Wang, Haishuai
Zhao, Yiqin
Li, Chao
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 272 - 276

← 1 2 3 4 5 →