MULTIMODAL CROSS- AND SELF-ATTENTION NETWORK FOR SPEECH EMOTION RECOGNITION

被引：30

作者：

Sun, Licai ^{[1
,2
]}

Liu, Bin ^{[2
]}

Tao, Jianhua ^{[1
,2
,3
]}

Lian, Zheng ^{[1
,2
]}

机构：

[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China

[2] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China

[3] CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

speech emotion recognition; multimodal fusion; self-attention; cross-attention;

D O I：

10.1109/ICASSP39728.2021.9414654

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech Emotion Recognition (SER) requires a thorough understanding of both the linguistic content of an utterance (i.e., textual information) and how the speaker utters it (i.e., acoustic information). The one vital challenge in SER is how to effectively fuse these two kinds of information. In this paper, we propose a novel Multimodal Cross- and Self-Attention Network (MCSAN) to tackle this problem. The core of MCSAN is to employ the parallel cross- and self-attention modules to explicitly model both inter- and intra-modal interactions of audio and text. Specifically, the cross-attention module utilizes the cross-attention mechanism to guide one modality to attend to the other modality and update the features accordingly. Similarly, the self-attention module employs the self-attention mechanism to propagate information within each modality. We evaluate MCSAN on two benchmark datasets, IEMOCAP and MELD. Experimental results demonstrate that our proposed model achieves state-of-the-art performance on both datasets.

引用

页码：4275 / 4279

页数：5

共 50 条

[41] EEG-Based Emotion Recognition With Emotion Localization via Hierarchical Self-Attention
Zhang, Yuzhe
Liu, Huan
Zhang, Dalin
Chen, Xuxu
Qin, Tao
Zheng, Qinghua
[J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 2458 - 2469
[42] Emotion embedding framework with emotional self-attention mechanism for speaker recognition
Li, Dongdong
Yang, Zhuo
Liu, Jinlin
Yang, Hai
Wang, Zhe
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
[43] Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition
Zhao, Ziping
Li, Qifei
Zhang, Zixing
Cummins, Nicholas
Wang, Haishuai
Tao, Jianhua
Schuller, Bjoern W.
[J]. NEURAL NETWORKS, 2021, 141 : 52 - 60
[44] Toward Auto-Modeling of Formal Verification for NextG Protocols: A Multimodal Cross- and Self-Attention Large Language Model Approach
Yang, Jingda
Wang, Ying
[J]. IEEE ACCESS, 2024, 12 : 27858 - 27869
[45] Attention Based Fully Convolutional Network for Speech Emotion Recognition
Zhang, Yuanyuan
Du, Jun
Wang, Zirui
Zhang, Jianshu
Tu, Yanhui
[J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1771 - 1775
[46] A Joint Network Based on Interactive Attention for Speech Emotion Recognition
Hu, Ying
Hou, Shijing
Yang, Huamin
Huang, Hao
He, Liang
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1715 - 1720
[47] A Fast Convolutional Self-attention Based Speech Dereverberation Method for Robust Speech Recognition
Li, Nan
Ge, Meng
Wang, Longbiao
Dang, Jianwu
[J]. NEURAL INFORMATION PROCESSING (ICONIP 2019), PT III, 2019, 11955 : 295 - 305
[48] Predicting Esophageal Fistula Risks Using a Multimodal Self-attention Network
Guan, Yulu
Cui, Hui
Xu, Yiyue
Jin, Qiangguo
Feng, Tian
Tu, Huawei
Xuan, Ping
Li, Wanlong
Wang, Linlin
Duh, Been-Lirn
[J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT V, 2021, 12905 : 721 - 730
[49] Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data
Lee, Chan Woo
Song, Kyu Ye
Jeong, Jihoon
Choi, Woo Yong
[J]. FIRST GRAND CHALLENGE AND WORKSHOP ON HUMAN MULTIMODAL LANGUAGE (CHALLENGE-HML), 2018, : 28 - 34
[50] Masked face recognition with convolutional visual self-attention network
Ge, Yiming
Liu, Hui
Du, Junzhao
Li, Zehua
Wei, Yuheng
[J]. NEUROCOMPUTING, 2023, 518 : 496 - 506

← 1 2 3 4 5 →