Removing Bias with Residual Mixture of Multi-View Attention for Speech Emotion Recognition

被引:5
|
作者
Jalal, Md Asif [1 ]
Milner, Rosanna [1 ]
Hain, Thomas [1 ]
Moore, Roger K. [1 ]
机构
[1] Univ Sheffield, Speech & Hearing Grp SPandH, Sheffield, S Yorkshire, England
来源
关键词
speech emotion recognition; attention networks; computational paralinguistics;
D O I
10.21437/Interspeech.2020-3005
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Speech emotion recognition is essential for obtaining emotional intelligence which affects the understanding of context and meaning of speech. The fundamental challenges of speech emotion recognition from a machine learning standpoint is to extract patterns which carry maximum correlation with the emotion information encoded in this signal, and to be as insensitive as possible to other types of information carried by speech. In this paper, a novel recurrent residual temporal context modelling framework is proposed. The framework includes mixture of multi-view attention smoothing and high dimensional feature projection for context expansion and learning feature representations. The framework is designed to be robust to changes in speaker and other distortions, and it provides state-of-the-art results for speech emotion recognition. Performance of the proposed approach is compared with a wide range of current architectures in a standard 4-class classification task on the widely used IEMOCAP corpus. A significant improvement of 4% unweighted accuracy over state-of-the-art systems is observed. Additionally, the attention vectors have been aligned with the input segments and plotted at two different attention levels to demonstrate the effectiveness.
引用
收藏
页码:4084 / 4088
页数:5
相关论文
共 50 条
  • [41] MVANet: Multi-Task Guided Multi-View Attention Network for Chinese Food Recognition
    Liang, Haozan
    Wen, Guihua
    Hu, Yang
    Luo, Mingnan
    Yang, Pei
    Xu, Yingxue
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 3551 - 3561
  • [42] Multi-view Neural Networks for Raw Audio-based Music Emotion Recognition
    He, Na
    Ferguson, Sam
    2020 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2020), 2020, : 168 - 172
  • [43] A Multi-View Face Recognition System
    张永越
    彭振云
    游素亚
    徐光佑
    Journal of Computer Science and Technology, 1997, (05) : 400 - 407
  • [44] MULTI-VIEW NORMALIZATION FOR FACE RECOGNITION
    Tang, Chia-Hao
    Chou, Yi-Mei
    Hsu, Gee-Sera Jison
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2343 - 2347
  • [45] A Survey of Multi-view Gait Recognition
    Wang K.-J.
    Ding X.-N.
    Xing X.-L.
    Liu M.-C.
    Zidonghua Xuebao/Acta Automatica Sinica, 2019, 45 (05): : 841 - 852
  • [46] A multi-view face recognition system
    Yongyue Zhang
    Zhenyun Peng
    Suya You
    Guangyou Xu
    Journal of Computer Science and Technology, 1997, 12 (5) : 400 - 407
  • [47] Multi-view face recognition system
    Zhang, Yongyue
    Peng, Zhenyun
    You, Suya
    Xu, Guangyou
    Journal of Computer Science and Technology, 1997, 12 (05): : 400 - 407
  • [48] Integrating Multi-view Analysis: Multi-view Mixture-of-Expert for Textual Personality Detection
    Zhu, Haohao
    Zhang, Xiaokun
    Lu, Junyu
    Yang, Liang
    Lin, Hongfei
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT IV, NLPCC 2024, 2025, 15362 : 359 - 371
  • [49] Multi-head attention fusion networks for multi-modal speech emotion recognition
    Zhang, Junfeng
    Xing, Lining
    Tan, Zhen
    Wang, Hongsen
    Wang, Kesheng
    COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 168
  • [50] MapReduce for Multi-view Object Recognition
    Noor, Shaheena
    Uddin, Vali
    2016 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2016), 2016, : 575 - 582