A Two-Stage Attention Based Modality Fusion Framework for Multi-Modal Speech Emotion Recognition

被引:1
|
作者
Hu, Dongni [1 ,2 ]
Chen, Chengxin [1 ,2 ]
Zhang, Pengyuan [1 ,2 ]
Li, Junfeng [1 ,2 ]
Yan, Yonghong [1 ,2 ]
Zhao, Qingwei [1 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing 100864, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100864, Peoples R China
关键词
speech emotion recognition; multi-modal fusion; attention mechanism; end-to-end;
D O I
10.1587/transinf.2021EDL8002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, automated recognition and analysis of human emotion has attracted increasing attention from multidisciplinary communities. However, it is challenging to utilize the emotional information simultaneously from multiple modalities. Previous studies have explored different fusion methods, but they mainly focused on either inter-modality interaction or intra-modality interaction. In this letter, we propose a novel two-stage fusion strategy named modality attention flow (MAF) to model the intra- and inter-modality interactions simultaneously in a unified end-to-end framework. Experimental results show that the proposed approach outperforms the widely used late fusion methods, and achieves even better performance when the number of stacked MAF blocks increases.
引用
收藏
页码:1391 / 1394
页数:4
相关论文
共 50 条
  • [1] Multi-modal Attention for Speech Emotion Recognition
    Pan, Zexu
    Luo, Zhaojie
    Yang, Jichen
    Li, Haizhou
    [J]. INTERSPEECH 2020, 2020, : 364 - 368
  • [2] Multi-head attention fusion networks for multi-modal speech emotion recognition
    Zhang, Junfeng
    Xing, Lining
    Tan, Zhen
    Wang, Hongsen
    Wang, Kesheng
    [J]. COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 168
  • [3] ATTENTION DRIVEN FUSION FOR MULTI-MODAL EMOTION RECOGNITION
    Priyasad, Darshana
    Fernando, Tharindu
    Denman, Simon
    Sridharan, Sridha
    Fookes, Clinton
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3227 - 3231
  • [4] Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework
    Liu, Yang
    Sun, Haoqin
    Guan, Wenbo
    Xia, Yuqi
    Zhao, Zhen
    [J]. Speech Communication, 2022, 139 : 1 - 9
  • [5] Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework
    Liu, Yang
    Sun, Haoqin
    Guan, Wenbo
    Xia, Yuqi
    Zhao, Zhen
    [J]. SPEECH COMMUNICATION, 2022, 139 : 1 - 9
  • [6] A Two-Stage Multi-Modal Multi-Label Emotion Recognition Decision System Based on GCN
    Wu, Weiwei
    Chen, Daomin
    Li, Qingping
    [J]. International Journal of Decision Support System Technology, 2024, 16 (01)
  • [7] Multi-modal Emotion Recognition Based on Speech and Image
    Li, Yongqiang
    He, Qi
    Zhao, Yongping
    Yao, Hongxun
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 844 - 853
  • [8] Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning
    Liu, Dong
    Wang, Zhiyong
    Wang, Lifeng
    Chen, Longxi
    [J]. FRONTIERS IN NEUROROBOTICS, 2021, 15
  • [10] A multi-modal emotion fusion classification method combined expression and speech based on attention mechanism
    Liu, Dong
    Chen, Longxi
    Wang, Lifeng
    Wang, Zhiyong
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (29) : 41677 - 41695