A multimodal fusion emotion recognition method based on multitask learning and attention mechanism

被引:4
|
作者
Xie, Jinbao [1 ]
Wang, Jiyu [2 ]
Wang, Qingyan [2 ]
Yang, Dali [1 ]
Gu, Jinming [2 ]
Tang, Yongqiang [2 ]
Varatnitski, Yury I. [3 ]
机构
[1] Hainan Normal Univ, Coll Phys & Elect Engn, Haikou 571158, Peoples R China
[2] Harbin Univ Sci & Technol, Sch Measurement & Control Technol & Commun Engn, Harbin 150000, Peoples R China
[3] Belarusian State Univ, Fac Radiophys & Comp Technol, Minsk 220030, BELARUS
关键词
Multitasking learning; Attention mechanism; Multimodal; Emotion recognition; SENTIMENT ANALYSIS;
D O I
10.1016/j.neucom.2023.126649
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With new developments in the field of human-computer interaction, researchers are now paying attention to emotion recognition, especially multimodal emotion recognition, as emotion is a multidimensional expression. In this study, we propose a multimodal fusion emotion recognition method (MTL-BAM) based on multitask learning and the attention mechanism to tackle the major problems encountered in multimodal emotion recognition tasks regarding the lack of consideration of emotion interactions among modalities and the focus on emotion similarity among modalities while ignoring the differences. By improving the attention mechanism, the emotional contribution of each modality is further analyzed so that the emotional representations of each modality can learn from and complement each other to achieve better interactive fusion effect, thereby building a multitask learning framework. By introducing three types of monomodal emotion recognition tasks as auxiliary tasks, the model can detect emotion differences. Simultaneously, the label generation unit is introduced into the auxiliary tasks, and the monomodal emotion label value can be obtained more accurately through two proportional formulas while preventing the zero value problem. Our results show that the proposed method outperforms selected state-of-the-art methods on four evaluation indexes of emotion classification (i.e., accuracy, F1 score, MAE, and Pearson correlation coefficient). The proposed method achieved accuracy rates of 85.36% and 84.61% on the published multimodal datasets of CMU-MOSI and CMU-MOSEI, respectively, which are 2-6% higher than those of existing state-of-the-art models, demonstrating good multimodal emotion recognition performance and strong generalizability.
引用
下载
收藏
页数:13
相关论文
共 50 条
  • [1] Toward Mathematical Representation of Emotion: A Deep Multitask Learning Method Based On Multimodal Recognition
    Harata, Seiichi
    Sakuma, Takuto
    Kato, Shohei
    COMPANION PUBLICATON OF THE 2020 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION (ICMI '20 COMPANION), 2020, : 47 - 51
  • [2] A speech emotion recognition method for the elderly based on feature fusion and attention mechanism
    Jian, Qijian
    Xiang, Min
    Huang, Wei
    THIRD INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION; NETWORK AND COMPUTER TECHNOLOGY (ECNCT 2021), 2022, 12167
  • [3] Expression EEG Multimodal Emotion Recognition Method Based on the Bidirectional LSTM and Attention Mechanism
    Zhao, Yifeng
    Chen, Deyun
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2021, 2021
  • [4] MULTITASK LEARNING AND MULTISTAGE FUSION FOR DIMENSIONAL AUDIOVISUAL EMOTION RECOGNITION
    Atmaja, Bagus Tris
    Akagi, Masato
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4482 - 4486
  • [5] A Multitask learning model for multimodal sarcasm, sentiment and emotion recognition in conversations
    Zhang, Yazhou
    Wang, Jinglin
    Liu, Yaochen
    Rong, Lu
    Zheng, Qian
    Song, Dawei
    Tiwari, Prayag
    Qin, Jing
    INFORMATION FUSION, 2023, 93 : 282 - 301
  • [6] Video multimodal emotion recognition based on Bi-GRU and attention fusion
    Huan, Ruo-Hong
    Shu, Jia
    Bao, Sheng-Lin
    Liang, Rong-Hua
    Chen, Peng
    Chi, Kai-Kai
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 8213 - 8240
  • [7] Hierarchical Attention-Based Multimodal Fusion Network for Video Emotion Recognition
    Liu, Xiaodong
    Li, Songyang
    Wang, Miao
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [8] Video multimodal emotion recognition based on Bi-GRU and attention fusion
    Ruo-Hong Huan
    Jia Shu
    Sheng-Lin Bao
    Rong-Hua Liang
    Peng Chen
    Kai-Kai Chi
    Multimedia Tools and Applications, 2021, 80 : 8213 - 8240
  • [9] MULTIMODAL ATTENTION-MECHANISM FOR TEMPORAL EMOTION RECOGNITION
    Ghaleb, Esam
    Niehues, Jan
    Asteriadis, Stylianos
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 251 - 255
  • [10] Deep Feature Extraction and Attention Fusion for Multimodal Emotion Recognition
    Yang, Zhiyi
    Li, Dahua
    Hou, Fazheng
    Song, Yu
    Gao, Qiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (03) : 1526 - 1530