Removing Bias with Residual Mixture of Multi-View Attention for Speech Emotion Recognition

被引：5

作者：

Jalal, Md Asif ^{[1
]}

Milner, Rosanna ^{[1
]}

Hain, Thomas ^{[1
]}

Moore, Roger K. ^{[1
]}

机构：

[1] Univ Sheffield, Speech & Hearing Grp SPandH, Sheffield, S Yorkshire, England

来源：

INTERSPEECH 2020 | 2020年

关键词：

speech emotion recognition; attention networks; computational paralinguistics;

D O I：

10.21437/Interspeech.2020-3005

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Speech emotion recognition is essential for obtaining emotional intelligence which affects the understanding of context and meaning of speech. The fundamental challenges of speech emotion recognition from a machine learning standpoint is to extract patterns which carry maximum correlation with the emotion information encoded in this signal, and to be as insensitive as possible to other types of information carried by speech. In this paper, a novel recurrent residual temporal context modelling framework is proposed. The framework includes mixture of multi-view attention smoothing and high dimensional feature projection for context expansion and learning feature representations. The framework is designed to be robust to changes in speaker and other distortions, and it provides state-of-the-art results for speech emotion recognition. Performance of the proposed approach is compared with a wide range of current architectures in a standard 4-class classification task on the widely used IEMOCAP corpus. A significant improvement of 4% unweighted accuracy over state-of-the-art systems is observed. Additionally, the attention vectors have been aligned with the input segments and plotted at two different attention levels to demonstrate the effectiveness.

引用

页码：4084 / 4088

页数：5

共 50 条

[41] MVANet: Multi-Task Guided Multi-View Attention Network for Chinese Food Recognition
Liang, Haozan
Wen, Guihua
Hu, Yang
Luo, Mingnan
Yang, Pei
Xu, Yingxue
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 3551 - 3561
[42] Multi-view Neural Networks for Raw Audio-based Music Emotion Recognition
He, Na
Ferguson, Sam
2020 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2020), 2020, : 168 - 172
[43] A Multi-View Face Recognition System
张永越
彭振云
游素亚
徐光佑
Journal of Computer Science and Technology, 1997, (05) : 400 - 407
[44] MULTI-VIEW NORMALIZATION FOR FACE RECOGNITION
Tang, Chia-Hao
Chou, Yi-Mei
Hsu, Gee-Sera Jison
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2343 - 2347
[45] A Survey of Multi-view Gait Recognition
Wang K.-J.
Ding X.-N.
Xing X.-L.
Liu M.-C.
Zidonghua Xuebao/Acta Automatica Sinica, 2019, 45 (05): : 841 - 852
[46] A multi-view face recognition system
Yongyue Zhang
Zhenyun Peng
Suya You
Guangyou Xu
Journal of Computer Science and Technology, 1997, 12 (5) : 400 - 407
[47] Multi-view face recognition system
Zhang, Yongyue
Peng, Zhenyun
You, Suya
Xu, Guangyou
Journal of Computer Science and Technology, 1997, 12 (05): : 400 - 407
[48] Integrating Multi-view Analysis: Multi-view Mixture-of-Expert for Textual Personality Detection
Zhu, Haohao
Zhang, Xiaokun
Lu, Junyu
Yang, Liang
Lin, Hongfei
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT IV, NLPCC 2024, 2025, 15362 : 359 - 371
[49] Multi-head attention fusion networks for multi-modal speech emotion recognition
Zhang, Junfeng
Xing, Lining
Tan, Zhen
Wang, Hongsen
Wang, Kesheng
COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 168
[50] MapReduce for Multi-view Object Recognition
Noor, Shaheena
Uddin, Vali
2016 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2016), 2016, : 575 - 582

← 1 2 3 4 5 →