Removing Bias with Residual Mixture of Multi-View Attention for Speech Emotion Recognition

被引:5
|
作者
Jalal, Md Asif [1 ]
Milner, Rosanna [1 ]
Hain, Thomas [1 ]
Moore, Roger K. [1 ]
机构
[1] Univ Sheffield, Speech & Hearing Grp SPandH, Sheffield, S Yorkshire, England
来源
关键词
speech emotion recognition; attention networks; computational paralinguistics;
D O I
10.21437/Interspeech.2020-3005
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Speech emotion recognition is essential for obtaining emotional intelligence which affects the understanding of context and meaning of speech. The fundamental challenges of speech emotion recognition from a machine learning standpoint is to extract patterns which carry maximum correlation with the emotion information encoded in this signal, and to be as insensitive as possible to other types of information carried by speech. In this paper, a novel recurrent residual temporal context modelling framework is proposed. The framework includes mixture of multi-view attention smoothing and high dimensional feature projection for context expansion and learning feature representations. The framework is designed to be robust to changes in speaker and other distortions, and it provides state-of-the-art results for speech emotion recognition. Performance of the proposed approach is compared with a wide range of current architectures in a standard 4-class classification task on the widely used IEMOCAP corpus. A significant improvement of 4% unweighted accuracy over state-of-the-art systems is observed. Additionally, the attention vectors have been aligned with the input segments and plotted at two different attention levels to demonstrate the effectiveness.
引用
收藏
页码:4084 / 4088
页数:5
相关论文
共 50 条
  • [1] Multimodal speech emotion recognition based on multi-scale MFCCs and multi-view attention mechanism
    Lin Feng
    Lu-Yao Liu
    Sheng-Lan Liu
    Jian Zhou
    Han-Qing Yang
    Jie Yang
    Multimedia Tools and Applications, 2023, 82 : 28917 - 28935
  • [2] Multimodal speech emotion recognition based on multi-scale MFCCs and multi-view attention mechanism
    Feng, Lin
    Liu, Lu-Yao
    Liu, Sheng-Lan
    Zhou, Jian
    Yang, Han-Qing
    Yang, Jie
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 28917 - 28935
  • [3] Multi-View Speech Emotion Recognition Via Collective Relation Construction
    Hou, Mixiao
    Zhang, Zheng
    Cao, Qi
    Zhang, David
    Lu, Guangming
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 218 - 229
  • [4] Discriminative feature learning based on multi-view attention network with diffusion joint loss for speech emotion recognition
    Liu, Yang
    Chen, Xin
    Song, Yuan
    Li, Yarong
    Wang, Shengbei
    Yuan, Weitao
    Li, Yongwei
    Zhao, Zhen
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
  • [5] Multimodal and Multi-view Models for Emotion Recognition
    Aguilar, Gustavo
    Rozgic, Viktor
    Wang, Weiran
    Wang, Chao
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 991 - 1002
  • [6] Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition
    Alastruey, Belen
    Drude, Lukas
    Heymann, Jahn
    Wiesler, Simon
    INTERSPEECH 2023, 2023, : 4973 - 4977
  • [7] EMOTION RECOGNITION BASED ON MULTI-VIEW BODY GESTURES
    Shen, Zhijuan
    Cheng, Jun
    Hu, Xiping
    Dong, Qian
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3317 - 3321
  • [8] Multi-View Attention Transfer for Efficient Speech Enhancement
    Shin, Wooseok
    Park, Hyun Joon
    Kim, Jin Sob
    Lee, Byung Hoon
    Han, Sung Won
    INTERSPEECH 2022, 2022, : 1198 - 1202
  • [9] Multi-View Hierarchical Attention Graph Convolutional Network with Domain Adaptation for EEG Emotion Recognition
    Li, Chao
    Wang, Feng
    Bian, Ning
    PROCEEDINGS OF 2024 3RD INTERNATIONAL CONFERENCE ON CRYPTOGRAPHY, NETWORK SECURITY AND COMMUNICATION TECHNOLOGY, CNSCT 2024, 2024, : 624 - 630
  • [10] Action Recognition with a Multi-View Temporal Attention Network
    Dengdi Sun
    Zhixiang Su
    Zhuanlian Ding
    Bin Luo
    Cognitive Computation, 2022, 14 : 1082 - 1095