A multimodal fusion emotion recognition method based on multitask learning and attention mechanism

被引:4
|
作者
Xie, Jinbao [1 ]
Wang, Jiyu [2 ]
Wang, Qingyan [2 ]
Yang, Dali [1 ]
Gu, Jinming [2 ]
Tang, Yongqiang [2 ]
Varatnitski, Yury I. [3 ]
机构
[1] Hainan Normal Univ, Coll Phys & Elect Engn, Haikou 571158, Peoples R China
[2] Harbin Univ Sci & Technol, Sch Measurement & Control Technol & Commun Engn, Harbin 150000, Peoples R China
[3] Belarusian State Univ, Fac Radiophys & Comp Technol, Minsk 220030, BELARUS
关键词
Multitasking learning; Attention mechanism; Multimodal; Emotion recognition; SENTIMENT ANALYSIS;
D O I
10.1016/j.neucom.2023.126649
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With new developments in the field of human-computer interaction, researchers are now paying attention to emotion recognition, especially multimodal emotion recognition, as emotion is a multidimensional expression. In this study, we propose a multimodal fusion emotion recognition method (MTL-BAM) based on multitask learning and the attention mechanism to tackle the major problems encountered in multimodal emotion recognition tasks regarding the lack of consideration of emotion interactions among modalities and the focus on emotion similarity among modalities while ignoring the differences. By improving the attention mechanism, the emotional contribution of each modality is further analyzed so that the emotional representations of each modality can learn from and complement each other to achieve better interactive fusion effect, thereby building a multitask learning framework. By introducing three types of monomodal emotion recognition tasks as auxiliary tasks, the model can detect emotion differences. Simultaneously, the label generation unit is introduced into the auxiliary tasks, and the monomodal emotion label value can be obtained more accurately through two proportional formulas while preventing the zero value problem. Our results show that the proposed method outperforms selected state-of-the-art methods on four evaluation indexes of emotion classification (i.e., accuracy, F1 score, MAE, and Pearson correlation coefficient). The proposed method achieved accuracy rates of 85.36% and 84.61% on the published multimodal datasets of CMU-MOSI and CMU-MOSEI, respectively, which are 2-6% higher than those of existing state-of-the-art models, demonstrating good multimodal emotion recognition performance and strong generalizability.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Music Emotion Recognition Based on Feature Fusion Broad Learning Method
    郁进明
    张晨光
    海涵
    Journal of Donghua University(English Edition), 2023, 40 (03) : 343 - 350
  • [32] MuMu: Cooperative Multitask Learning-Based Guided Multimodal Fusion
    Islam, Md Mofijul
    Iqbal, Tariq
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1043 - 1051
  • [33] Special video classification based on multitask learning and multimodal feature fusion
    Wu X.-Y.
    Gu C.-N.
    Wang S.-J.
    Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2020, 28 (05): : 1177 - 1186
  • [34] Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning
    Mocanu, Bogdan
    Tapu, Ruxandra
    Zaharia, Titus
    IMAGE AND VISION COMPUTING, 2023, 133
  • [35] MultiEMO: An Attention-Based Correlation-Aware Multimodal Fusion Framework for Emotion Recognition in Conversations
    Shi, Tao
    Huang, Shao-Lun
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14752 - 14766
  • [36] A Multitask Learning Approach Based on Cascaded Attention Network and Self-Adaption Loss for Speech Emotion Recognition
    Liu, Yang
    Xia, Yuqi
    Sun, Haoqin
    Meng, Xiaolei
    Bai, Jianxiong
    Guan, Wenbo
    Zhao, Zhen
    LI, Yongwei
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2023, E106A (06) : 876 - 885
  • [37] Research on Multimodal Emotion Recognition Based on Fusion of Electroencephalogram and Electrooculography
    Yin, Jialai
    Wu, Minchao
    Yang, Yan
    Li, Ping
    Li, Fan
    Liang, Wen
    Lv, Zhao
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 12
  • [38] Speech emotion recognition based on multimodal and multiscale feature fusion
    Huangshui Hu
    Jie Wei
    Hongyu Sun
    Chuhang Wang
    Shuo Tao
    Signal, Image and Video Processing, 2025, 19 (2)
  • [39] Multimodal Fusion based on Information Gain for Emotion Recognition in the Wild
    Ghaleb, Esam
    Popa, Mirela
    Hortal, Enrique
    Asteriadis, Stylianos
    PROCEEDINGS OF THE 2017 INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), 2017, : 814 - 823
  • [40] Video Emotion Recognition in the Wild Based on Fusion of Multimodal Features
    Chen, Shizhe
    Li, Xinrui
    Jin, Qin
    Zhang, Shilei
    Qin, Yong
    ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 494 - 500