A Two-Stage Attention Based Modality Fusion Framework for Multi-Modal Speech Emotion Recognition

被引：1

作者：

Hu, Dongni ^{[1
,2
]}

Chen, Chengxin ^{[1
,2
]}

Zhang, Pengyuan ^{[1
,2
]}

Li, Junfeng ^{[1
,2
]}

Yan, Yonghong ^{[1
,2
]}

Zhao, Qingwei ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing 100864, Peoples R China

[2] Univ Chinese Acad Sci, Beijing 100864, Peoples R China

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2021年 / E104D卷 / 08期

关键词：

speech emotion recognition; multi-modal fusion; attention mechanism; end-to-end;

D O I：

10.1587/transinf.2021EDL8002

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, automated recognition and analysis of human emotion has attracted increasing attention from multidisciplinary communities. However, it is challenging to utilize the emotional information simultaneously from multiple modalities. Previous studies have explored different fusion methods, but they mainly focused on either inter-modality interaction or intra-modality interaction. In this letter, we propose a novel two-stage fusion strategy named modality attention flow (MAF) to model the intra- and inter-modality interactions simultaneously in a unified end-to-end framework. Experimental results show that the proposed approach outperforms the widely used late fusion methods, and achieves even better performance when the number of stacked MAF blocks increases.

引用

页码：1391 / 1394

页数：4

共 50 条

[1] Multi-modal Attention for Speech Emotion Recognition
Pan, Zexu
Luo, Zhaojie
Yang, Jichen
Li, Haizhou
INTERSPEECH 2020, 2020, : 364 - 368
[2] Multi-head attention fusion networks for multi-modal speech emotion recognition
Zhang, Junfeng
Xing, Lining
Tan, Zhen
Wang, Hongsen
Wang, Kesheng
COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 168
[3] ATTENTION DRIVEN FUSION FOR MULTI-MODAL EMOTION RECOGNITION
Priyasad, Darshana
Fernando, Tharindu
Denman, Simon
Sridharan, Sridha
Fookes, Clinton
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3227 - 3231
[4] Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework
Liu, Yang
Sun, Haoqin
Guan, Wenbo
Xia, Yuqi
Zhao, Zhen
Speech Communication, 2022, 139 : 1 - 9
[5] Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework
Liu, Yang
Sun, Haoqin
Guan, Wenbo
Xia, Yuqi
Zhao, Zhen
SPEECH COMMUNICATION, 2022, 139 : 1 - 9
[6] A Two-Stage Multi-Modal Multi-Label Emotion Recognition Decision System Based on GCN
Wu, Weiwei
Chen, Daomin
Li, Qingping
INTERNATIONAL JOURNAL OF DECISION SUPPORT SYSTEM TECHNOLOGY, 2024, 16 (01)
[7] Multi-modal Emotion Recognition Based on Speech and Image
Li, Yongqiang
He, Qi
Zhao, Yongping
Yao, Hongxun
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 844 - 853
[8] Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning
Liu, Dong
Wang, Zhiyong
Wang, Lifeng
Chen, Longxi
FRONTIERS IN NEUROROBOTICS, 2021, 15
[9] TAG-fusion: Two-stage attention guided multi-modal fusion network for semantic segmentation
Zhang, Zhizhou
Wang, Wenwu
Zhu, Lei
Tang, Zhibin
DIGITAL SIGNAL PROCESSING, 2025, 156
[10] A multi-modal emotion fusion classification method combined expression and speech based on attention mechanism
Liu, Dong
Chen, Longxi
Wang, Lifeng
Wang, Zhiyong
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (29) : 41677 - 41695

← 1 2 3 4 5 →