Siamese Attention-Based LSTM for Speech Emotion Recognition

被引:0
|
作者
Nizamidin, Tashpolat [1 ]
Zhao, Li [1 ]
Liang, Ruiyu [2 ]
Xie, Yue [1 ]
Hamdulla, Askar [3 ]
机构
[1] Southeast Univ, Minist Educ, Key Lab Underwater Acoust Signal Proc, Nanjing 210096, Jiangsu, Peoples R China
[2] Nanjing Inst Technol, Sch Commun Engn, Nanjing 211167, Jiangsu, Peoples R China
[3] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi 830046, Peoples R China
关键词
Siamese networks; pairwise training; attention-based long short-term memory; speech emotion recognition;
D O I
10.1587/transfun.2019EAL2156
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As one of the popular topics in the field of human-computer interaction, the Speech Emotion Recognition (SER) aims to classify the emotional tendency from the speakers' utterances. Using the existing deep learning methods, and with a large amount of training data, we can achieve a highly accurate performance result. Unfortunately, it's time consuming and difficult job to build such a huge emotional speech database that can be applicable universally. However, the Siamese Neural Network (SNN), which we discuss in this paper, can yield extremely precise results with just a limited amount of training data through pairwise training which mitigates the impacts of sample deficiency and provides enough iterations. To obtain enough SER training, this study proposes a novel method which uses Siamese Attention-based Long Short-Term Memory Networks. In this framework, we designed two Attention-based Long Short-Term Memory Networks which shares the same weights, and we input frame level acoustic emotional features to the Siamese network rather than utterance level emotional features. The proposed solution has been evaluated on EMODB, ABC and UYGSEDB corpora, and showed significant improvement on SER results, compared to conventional deep learning methods.
引用
收藏
页码:937 / 941
页数:5
相关论文
共 50 条
  • [1] Attention-Based Dense LSTM for Speech Emotion Recognition
    Xie, Yue
    Liang, Ruiyu
    Liang, Zhenlin
    Zhao, Li
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (07): : 1426 - 1429
  • [2] Speech Emotion Classification Using Attention-Based LSTM
    Xie, Yue
    Liang, Ruiyu
    Liang, Zhenlin
    Huang, Chengwei
    Zou, Cairong
    Schuller, Bjoern
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1675 - 1685
  • [3] Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model
    Atmaja, Bagus Tris
    Akagi, Masato
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), 2019, : 40 - 44
  • [4] Attention-based Spatio-Temporal Graphic LSTM for EEG Emotion Recognition
    Li, Xiaoxu
    Zheng, Wenming
    Zong, Yuan
    Chang, Hongli
    Lu, Cheng
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [5] Attention-based LSTM with Multi-task Learning for Distant Speech Recognition
    Zhang, Yu
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3857 - 3861
  • [6] Attention-Based Models for Speech Recognition
    Chorowski, Jan
    Bahdanau, Dzmitry
    Serdyuk, Dmitriy
    Cho, Kyunghyun
    Bengio, Yoshua
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [7] Upgraded Attention-Based Local Feature Learning Block for Speech Emotion Recognition
    Zhao, Huan
    Gao, Yingxue
    Xiao, Yufeng
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT II, 2021, 12713 : 118 - 130
  • [8] A novel dual attention-based BLSTM with hybrid features in speech emotion recognition
    Chen, Qiupu
    Huang, Guimin
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 102
  • [9] A novel dual attention-based BLSTM with hybrid features in speech emotion recognition
    Chen, Qiupu
    Huang, Guimin
    [J]. Engineering Applications of Artificial Intelligence, 2021, 102
  • [10] Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition
    Zhao, Ziping
    Zheng, Yu
    Zhang, Zixing
    Wang, Haishuai
    Zhao, Yiqin
    Li, Chao
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 272 - 276