A Two-Stage Attention Based Modality Fusion Framework for Multi-Modal Speech Emotion Recognition

被引:1
|
作者
Hu, Dongni [1 ,2 ]
Chen, Chengxin [1 ,2 ]
Zhang, Pengyuan [1 ,2 ]
Li, Junfeng [1 ,2 ]
Yan, Yonghong [1 ,2 ]
Zhao, Qingwei [1 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing 100864, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100864, Peoples R China
关键词
speech emotion recognition; multi-modal fusion; attention mechanism; end-to-end;
D O I
10.1587/transinf.2021EDL8002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, automated recognition and analysis of human emotion has attracted increasing attention from multidisciplinary communities. However, it is challenging to utilize the emotional information simultaneously from multiple modalities. Previous studies have explored different fusion methods, but they mainly focused on either inter-modality interaction or intra-modality interaction. In this letter, we propose a novel two-stage fusion strategy named modality attention flow (MAF) to model the intra- and inter-modality interactions simultaneously in a unified end-to-end framework. Experimental results show that the proposed approach outperforms the widely used late fusion methods, and achieves even better performance when the number of stacked MAF blocks increases.
引用
收藏
页码:1391 / 1394
页数:4
相关论文
共 50 条
  • [41] Multi-Modal Emotion Recognition Using Speech Features and Text-Embedding
    Byun, Sung-Woo
    Kim, Ju-Hee
    Lee, Seok-Pil
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (17):
  • [42] Multi-Modal Emotion Recognition by Fusing Correlation Features of Speech-Visual
    Chen Guanghui
    Zeng Xiaoping
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 533 - 537
  • [43] Scene text recognition based on two-stage attention and multi-branch feature fusion module
    Xia, Shifeng
    Kou, Jinqiao
    Liu, Ningzhong
    Yin, Tianxiang
    [J]. APPLIED INTELLIGENCE, 2023, 53 (11) : 14219 - 14232
  • [44] Scene text recognition based on two-stage attention and multi-branch feature fusion module
    Shifeng Xia
    Jinqiao Kou
    Ningzhong Liu
    Tianxiang Yin
    [J]. Applied Intelligence, 2023, 53 : 14219 - 14232
  • [45] Listening and speaking knowledge fusion network for multi-modal emotion recognition in conversation
    Liu, Qin
    Xie, Jun
    Hu, Yong
    Hao, Shu-Feng
    Hao, Ya-Hui
    [J]. Kongzhi yu Juece/Control and Decision, 2024, 39 (06): : 2031 - 2040
  • [46] Branch-Fusion-Net for Multi-Modal Continuous Dimensional Emotion Recognition
    Li, Chiqin
    Xie, Lun
    Pan, Hang
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 942 - 946
  • [47] Multi-Modal CNN Features Fusion for Emotion Recognition: A Modified Xception Model
    Shahzad, H. M.
    Bhatti, Sohail Masood
    Jaffar, Arfan
    Rashid, Muhammad
    Akram, Sheeraz
    [J]. IEEE ACCESS, 2023, 11 : 94281 - 94289
  • [48] Towards Efficient Multi-Modal Emotion Recognition
    Dobrisek, Simon
    Gajsek, Rok
    Mihelic, France
    Pavesic, Nikola
    Struc, Vitomir
    [J]. INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2013, 10
  • [49] Emotion Recognition from Multi-Modal Information
    Wu, Chung-Hsien
    Lin, Jen-Chun
    Wei, Wen-Li
    Cheng, Kuan-Chun
    [J]. 2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [50] Two-level attention with two-stage multi-task learning for facial emotion recognition
    Wang Xiaohua
    Peng Muzi
    Pan Lijuan
    Hu Min
    Jin Chunhua
    Ren Fuji
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 62 : 217 - 225