A multimodal fusion emotion recognition method based on multitask learning and attention mechanism

被引:4
|
作者
Xie, Jinbao [1 ]
Wang, Jiyu [2 ]
Wang, Qingyan [2 ]
Yang, Dali [1 ]
Gu, Jinming [2 ]
Tang, Yongqiang [2 ]
Varatnitski, Yury I. [3 ]
机构
[1] Hainan Normal Univ, Coll Phys & Elect Engn, Haikou 571158, Peoples R China
[2] Harbin Univ Sci & Technol, Sch Measurement & Control Technol & Commun Engn, Harbin 150000, Peoples R China
[3] Belarusian State Univ, Fac Radiophys & Comp Technol, Minsk 220030, BELARUS
关键词
Multitasking learning; Attention mechanism; Multimodal; Emotion recognition; SENTIMENT ANALYSIS;
D O I
10.1016/j.neucom.2023.126649
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With new developments in the field of human-computer interaction, researchers are now paying attention to emotion recognition, especially multimodal emotion recognition, as emotion is a multidimensional expression. In this study, we propose a multimodal fusion emotion recognition method (MTL-BAM) based on multitask learning and the attention mechanism to tackle the major problems encountered in multimodal emotion recognition tasks regarding the lack of consideration of emotion interactions among modalities and the focus on emotion similarity among modalities while ignoring the differences. By improving the attention mechanism, the emotional contribution of each modality is further analyzed so that the emotional representations of each modality can learn from and complement each other to achieve better interactive fusion effect, thereby building a multitask learning framework. By introducing three types of monomodal emotion recognition tasks as auxiliary tasks, the model can detect emotion differences. Simultaneously, the label generation unit is introduced into the auxiliary tasks, and the monomodal emotion label value can be obtained more accurately through two proportional formulas while preventing the zero value problem. Our results show that the proposed method outperforms selected state-of-the-art methods on four evaluation indexes of emotion classification (i.e., accuracy, F1 score, MAE, and Pearson correlation coefficient). The proposed method achieved accuracy rates of 85.36% and 84.61% on the published multimodal datasets of CMU-MOSI and CMU-MOSEI, respectively, which are 2-6% higher than those of existing state-of-the-art models, demonstrating good multimodal emotion recognition performance and strong generalizability.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Audio-Video Fusion with Double Attention for Multimodal Emotion Recognition
    Mocanu, Bogdan
    Tapu, Ruxandra
    2022 IEEE 14TH IMAGE, VIDEO, AND MULTIDIMENSIONAL SIGNAL PROCESSING WORKSHOP (IVMSP), 2022,
  • [22] Dimensional emotion recognition based on two stream CNN fusion attention mechanism
    Qi, Mei
    Zhang, Hairong
    THIRD INTERNATIONAL CONFERENCE ON SENSORS AND INFORMATION TECHNOLOGY, ICSI 2023, 2023, 12699
  • [23] Multimodal and Multitask Learning with Additive Angular Penalty Focus Loss for Speech Emotion Recognition
    Wen, Guihua
    Ye, Sheng
    Li, Huihui
    Wen, Pengcheng
    Zhang, Yuhan
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2023, 2023
  • [24] Emotion Recognition Based on Feedback Weighted Fusion of Multimodal Emotion Data
    Wei, Wei
    Jia, Qingxuan
    Feng, Yongli
    2017 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (IEEE ROBIO 2017), 2017, : 1682 - 1687
  • [25] Multitask Touch Gesture and Emotion Recognition Using Multiscale Spatiotemporal Convolutions With Attention Mechanism
    Wang, Ya-Xin
    Li, Yun-Kai
    Yang, Tian-Hao
    Meng, Qing-Hao
    IEEE SENSORS JOURNAL, 2022, 22 (16) : 16190 - 16201
  • [26] Decision-Level Fusion Method for Emotion Recognition using Multimodal Emotion Recognition Information
    Song, Kyu-Seob
    Nho, Young-Hoon
    Seo, Ju-Hwan
    Kwon, Dong-Soo
    2018 15TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS (UR), 2018, : 472 - 476
  • [27] Bi-stream graph learning based multimodal fusion for emotion recognition in conversation
    Lu, Nannan
    Han, Zhiyuan
    Han, Min
    Qian, Jiansheng
    INFORMATION FUSION, 2024, 106
  • [28] An Attention Pooling based Representation Learning Method for Speech Emotion Recognition
    Li, Pengcheng
    Song, Yan
    McLoughlin, Ian
    Guo, Wu
    Dai, Lirong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3087 - 3091
  • [29] Multimodal Multitask Emotion Recognition using Images, Texts and Tags
    Fortin, Mathieu Page
    Chaib-draa, Brahim
    PROCEEDINGS OF THE ACM WORKSHOP ON CROSSMODAL LEARNING AND APPLICATION (WCRML'19), 2019, : 3 - 10
  • [30] Multimodal music emotion recognition method based on multi-source data fusion
    Liu B.
    International Journal of Reasoning-based Intelligent Systems, 2024, 16 (03) : 187 - 194