Removing Bias with Residual Mixture of Multi-View Attention for Speech Emotion Recognition

被引:5
|
作者
Jalal, Md Asif [1 ]
Milner, Rosanna [1 ]
Hain, Thomas [1 ]
Moore, Roger K. [1 ]
机构
[1] Univ Sheffield, Speech & Hearing Grp SPandH, Sheffield, S Yorkshire, England
来源
关键词
speech emotion recognition; attention networks; computational paralinguistics;
D O I
10.21437/Interspeech.2020-3005
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Speech emotion recognition is essential for obtaining emotional intelligence which affects the understanding of context and meaning of speech. The fundamental challenges of speech emotion recognition from a machine learning standpoint is to extract patterns which carry maximum correlation with the emotion information encoded in this signal, and to be as insensitive as possible to other types of information carried by speech. In this paper, a novel recurrent residual temporal context modelling framework is proposed. The framework includes mixture of multi-view attention smoothing and high dimensional feature projection for context expansion and learning feature representations. The framework is designed to be robust to changes in speaker and other distortions, and it provides state-of-the-art results for speech emotion recognition. Performance of the proposed approach is compared with a wide range of current architectures in a standard 4-class classification task on the widely used IEMOCAP corpus. A significant improvement of 4% unweighted accuracy over state-of-the-art systems is observed. Additionally, the attention vectors have been aligned with the input segments and plotted at two different attention levels to demonstrate the effectiveness.
引用
收藏
页码:4084 / 4088
页数:5
相关论文
共 50 条
  • [31] Multi-view and multi-scale behavior recognition algorithm based on attention mechanism
    Zhang, Di
    Chen, Chen
    Tan, Fa
    Qian, Beibei
    Li, Wei
    He, Xuan
    Lei, Susan
    FRONTIERS IN NEUROROBOTICS, 2023, 17
  • [32] Joint spatial and scale attention network for multi-view facial expression recognition
    Liu, Yuanyuan
    Peng, Jiyao
    Dai, Wei
    Zeng, Jiabei
    Shan, Shiguang
    PATTERN RECOGNITION, 2023, 139
  • [33] Multi-view dual attention network for 3D object recognition
    Wang, Wenju
    Cai, Yu
    Wang, Tao
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (04): : 3201 - 3212
  • [34] Multi-view dual attention network for 3D object recognition
    Wenju Wang
    Yu Cai
    Tao Wang
    Neural Computing and Applications, 2022, 34 : 3201 - 3212
  • [35] EFFICIENT SPEECH EMOTION RECOGNITION USING MULTI-SCALE CNN AND ATTENTION
    Peng, Zixuan
    Lu, Yu
    Pan, Shengfeng
    Liu, Yunfeng
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3020 - 3024
  • [36] The Impact of Attention Mechanisms on Speech Emotion Recognition
    Chen, Shouyan
    Zhang, Mingyan
    Yang, Xiaofen
    Zhao, Zhijia
    Zou, Tao
    Sun, Xinqi
    SENSORS, 2021, 21 (22)
  • [37] Self-attention for Speech Emotion Recognition
    Tarantino, Lorenzo
    Garner, Philip N.
    Lazaridis, Alexandros
    INTERSPEECH 2019, 2019, : 2578 - 2582
  • [38] DAR-MVSNet: a novel dual attention residual network for multi-view stereo
    Li, Tingshuai
    Liang, Hu
    Wen, Changchun
    Qu, Jiacheng
    Zhao, Shengrong
    Zhang, Qingmeng
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (8-9) : 5857 - 5866
  • [39] Multi-View Mammographic Density Classification by Dilated and Attention-Guided Residual Learning
    Li, Cheng
    Xu, Jingxu
    Liu, Qiegen
    Zhou, Yongjin
    Mou, Lisha
    Pu, Zuhui
    Xia, Yong
    Zheng, Hairong
    Wang, Shanshan
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2021, 18 (03) : 1003 - 1013
  • [40] MVANet: Multi-Task Guided Multi-View Attention Network for Chinese Food Recognition
    Liang, Haozan
    Wen, Guihua
    Hu, Yang
    Luo, Mingnan
    Yang, Pei
    Xu, Yingxue
    IEEE Transactions on Multimedia, 2021, 23 : 3551 - 3561