Improve Accuracy of Speech Emotion Recognition with Attention Head Fusion

被引:36
|
作者
Xu, Mingke [1 ]
Zhang, Fan [2 ]
Khan, Samee U. [3 ]
机构
[1] Nanjing Tech Univ, Comp Sci & Technol, Nanjing, Jiangsu, Peoples R China
[2] IBM Massachusette Lab, IBM Watson Grp, Littleton, MA USA
[3] North Dakota State Univ, Elect & Comp Eng, Fargo, ND USA
基金
美国国家科学基金会;
关键词
speech emotion recognition; convolutional neural network; attention mechanism; pattern recognition; machine Learning; CLASSIFICATION; MODEL;
D O I
10.1109/ccwc47524.2020.9031207
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Speech Emotion Recognition (SER) refers to the use of machines to recognize the emotions of a speaker from his (or her) speech. SER has broad application prospects in the fields of criminal investigation and medical care. However, the complexity of emotion makes it hard to be recognized and the current SER model still does not accurately recognize human emotions. In this paper, we propose a multi-head self-attention based attention method to improve the recognition accuracy of SER. We call this method head fusion. With this method, an attention layer can generate some attention map with multiple attention points instead of common attention maps with a single attention point. We implemented an attention-based convolutional neural networks (ACNN) model with this method and conducted experiments and evaluations on the Interactive Emotional Dyadic Motion Capture(IEMOCAP) corpus, obtained on improvised data 76.18% of weighted accuracy (WA) and 76.36% of unweighted accuracy (UA), which is increased by about 6% compared to the previous state-of-the-art SER model.
引用
收藏
页码:1058 / 1064
页数:7
相关论文
共 50 条
  • [21] Speech Emotion Recognition based on Multiple Feature Fusion
    Jiang, Changjiang
    Mao, Rong
    Liu, Geng
    Wang, Mingyi
    [J]. 2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 907 - 912
  • [22] Multimodal emotion recognition for the fusion of speech and EEG signals
    Ma, Jianghe
    Sun, Ying
    Zhang, Xueying
    [J]. Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (01): : 143 - 150
  • [23] ANN based Decision Fusion for Speech Emotion Recognition
    Xu, Lu
    Xu, Mingxing
    Yang, Dali
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2003 - +
  • [24] Multimodal transformer augmented fusion for speech emotion recognition
    Wang, Yuanyuan
    Gu, Yu
    Yin, Yifei
    Han, Yingping
    Zhang, He
    Wang, Shuang
    Li, Chenyu
    Quan, Dou
    [J]. FRONTIERS IN NEUROROBOTICS, 2023, 17
  • [25] Multi-algorithm Fusion for Speech Emotion Recognition
    Verma, Gyanendra K.
    Tiwary, U. S.
    Agrawal, Shaishav
    [J]. ADVANCES IN COMPUTING AND COMMUNICATIONS, PT III, 2011, 192 : 452 - 459
  • [26] Speech Emotion Recognition using SVM with thresholding fusion
    Gupta, Shilpi
    Mehra, Anu
    Vinay
    [J]. 2ND INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN) 2015, 2015, : 570 - 574
  • [27] Feature fusion: research on emotion recognition in English speech
    Yongyan Yang
    [J]. International Journal of Speech Technology, 2024, 27 (2) : 319 - 327
  • [28] Design of Hierarchical Classifier to Improve Speech Emotion Recognition
    Vasuki, P.
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2023, 44 (01): : 19 - 33
  • [29] A Study on the Combination of Emotion Keywords to Improve the Negative Emotion Recognition Accuracy
    Huang, Wen-Yi
    Pao, Tsang-Long
    [J]. 2012 6TH INTERNATIONAL CONFERENCE ON NEW TRENDS IN INFORMATION SCIENCE, SERVICE SCIENCE AND DATA MINING (ISSDM2012), 2012, : 499 - 503
  • [30] Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model
    Atmaja, Bagus Tris
    Akagi, Masato
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), 2019, : 40 - 44