Removing Bias with Residual Mixture of Multi-View Attention for Speech Emotion Recognition

被引：5

作者：

Jalal, Md Asif ^{[1
]}

Milner, Rosanna ^{[1
]}

Hain, Thomas ^{[1
]}

Moore, Roger K. ^{[1
]}

机构：

[1] Univ Sheffield, Speech & Hearing Grp SPandH, Sheffield, S Yorkshire, England

来源：

INTERSPEECH 2020 | 2020年

关键词：

speech emotion recognition; attention networks; computational paralinguistics;

D O I：

10.21437/Interspeech.2020-3005

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Speech emotion recognition is essential for obtaining emotional intelligence which affects the understanding of context and meaning of speech. The fundamental challenges of speech emotion recognition from a machine learning standpoint is to extract patterns which carry maximum correlation with the emotion information encoded in this signal, and to be as insensitive as possible to other types of information carried by speech. In this paper, a novel recurrent residual temporal context modelling framework is proposed. The framework includes mixture of multi-view attention smoothing and high dimensional feature projection for context expansion and learning feature representations. The framework is designed to be robust to changes in speaker and other distortions, and it provides state-of-the-art results for speech emotion recognition. Performance of the proposed approach is compared with a wide range of current architectures in a standard 4-class classification task on the widely used IEMOCAP corpus. A significant improvement of 4% unweighted accuracy over state-of-the-art systems is observed. Additionally, the attention vectors have been aligned with the input segments and plotted at two different attention levels to demonstrate the effectiveness.

引用

页码：4084 / 4088

页数：5

共 50 条

[1] Multimodal speech emotion recognition based on multi-scale MFCCs and multi-view attention mechanism
Lin Feng
Lu-Yao Liu
Sheng-Lan Liu
Jian Zhou
Han-Qing Yang
Jie Yang
Multimedia Tools and Applications, 2023, 82 : 28917 - 28935
[2] Multimodal speech emotion recognition based on multi-scale MFCCs and multi-view attention mechanism
Feng, Lin
Liu, Lu-Yao
Liu, Sheng-Lan
Zhou, Jian
Yang, Han-Qing
Yang, Jie
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 28917 - 28935
[3] Multi-View Speech Emotion Recognition Via Collective Relation Construction
Hou, Mixiao
Zhang, Zheng
Cao, Qi
Zhang, David
Lu, Guangming
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 218 - 229
[4] Discriminative feature learning based on multi-view attention network with diffusion joint loss for speech emotion recognition
Liu, Yang
Chen, Xin
Song, Yuan
Li, Yarong
Wang, Shengbei
Yuan, Weitao
Li, Yongwei
Zhao, Zhen
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
[5] Multimodal and Multi-view Models for Emotion Recognition
Aguilar, Gustavo
Rozgic, Viktor
Wang, Weiran
Wang, Chao
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 991 - 1002
[6] Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition
Alastruey, Belen
Drude, Lukas
Heymann, Jahn
Wiesler, Simon
INTERSPEECH 2023, 2023, : 4973 - 4977
[7] EMOTION RECOGNITION BASED ON MULTI-VIEW BODY GESTURES
Shen, Zhijuan
Cheng, Jun
Hu, Xiping
Dong, Qian
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3317 - 3321
[8] Multi-View Attention Transfer for Efficient Speech Enhancement
Shin, Wooseok
Park, Hyun Joon
Kim, Jin Sob
Lee, Byung Hoon
Han, Sung Won
INTERSPEECH 2022, 2022, : 1198 - 1202
[9] Multi-View Hierarchical Attention Graph Convolutional Network with Domain Adaptation for EEG Emotion Recognition
Li, Chao
Wang, Feng
Bian, Ning
PROCEEDINGS OF 2024 3RD INTERNATIONAL CONFERENCE ON CRYPTOGRAPHY, NETWORK SECURITY AND COMMUNICATION TECHNOLOGY, CNSCT 2024, 2024, : 624 - 630
[10] Action Recognition with a Multi-View Temporal Attention Network
Dengdi Sun
Zhixiang Su
Zhuanlian Ding
Bin Luo
Cognitive Computation, 2022, 14 : 1082 - 1095

← 1 2 3 4 5 →