Multi-modal Emotion Recognition Based on Hypergraph

被引:0
|
作者
Zong, Lin-Lin [1 ]
Zhou, Jia-Hui [1 ]
Xie, Qiu-Jie [2 ]
Zhang, Xian-Chao [1 ]
Xu, Bo [3 ]
机构
[1] Department of Software, Dalian University of Technology, Liaoning, Dalian,116000, China
[2] School of Computer Science and Technology, Fudan University, Shanghai,200433, China
[3] School of Computer Science and Technology, Dalian University of Technology, Liaoning, Dalian,116000, China
来源
关键词
Artificial intelligence - Character recognition - Convolution - Data handling - Graph theory - Graphic methods - Human computer interaction - Learning systems - Modal analysis - Speech recognition;
D O I
10.11897/SP.J.1016.2023.02520
中图分类号
学科分类号
摘要
With the rapid progress of artificial intelligence technology, machines need to recognize users’ emotions to provide users with a better human-computer interaction experience. Therefore, emotion recognition has become one of the active fields of artificial intelligence. Traditional emotion recognition is mostly based on text modality. Compared with single modality, multi-modal emotion recognition has the advantages of data complementarity and model robustness. In multi-modal emotion recognition, feature fusion between modalities determines the effect of emotion recognition. Recently, graph-based intra-modality fusion has attracted much attention of related research, which uses graphs of binary relationships between two modalities. When processing data of three or more modalities, the graph can hardly effectively establish the feature fusion between all modalities without introducing redundant information, limiting the performance of multi-modal emotion recognition. Therefore, it is necessary to design more effective method to model and fuse multi-modal emotion features. To solve this problem, this paper proposes an emotion recognition model Multi-modal Emotion Recognition Based on Hypergraph (MORAH) which introduces hypergraph to establish multivariate relations among multi-modal data instead of binary relations and achieves efficient multi-modal feature fusion. Specifically, the model divides multi-modal feature fusion into two stages: the hyperedge construction stage and the hypergraph learning stage. In the hyperedge construction stage, we aggregate the information of each time step in the sequence through the capsule network and establish the graph of a single modality. Then, we use graph convolution for the second aggregation, which is used as the basis for establishing hypergraph in the next stage. Benefiting from the graph capsule aggregation method, the model can work with aligned data and unaligned data at the same time, without manual alignment of unaligned data. In the hypergraph learning stage, we not only establish the association between the nodes of different modalities of the same sample but also establish the association between all modalities of the same sample. At the same time, we use hierarchical multi-level hyperedges to avoid too smooth node embedding and the simple hypergraph convolution method to fuse the high-level features between modalities, ensuring that all node features are only updated when necessary in the hypergraph convolution process. Simplified graph convolution can guarantee the effect of emotion recognition and improve the training speed without nonlinear activation and convolution filter matrix. Comprehensive experiments on two benchmark datasets show that the proposed model makes full use of the multiple relations between multi-modal data by using hypergraph. Compared with the existing advanced methods, MORAH improves the binary accuracy by 1.3% and F1-score by 1.1% on the unaligned data of the CMU-MOSI dataset. On the unaligned data of the CMU-MOSEI dataset, MORAH improves the binary accuracy and the F1-score by 0. 2%, respectively. To demonstrate the generality of the hypergraph learning stage in various multimodal tasks, we apply the hierarchical multi-level hyperedges to the emotion recognition in conversation (ERC). The experimental results indicate that MORAH can improve the performance of ERC to a certain extent. This suggests that the MORAH model can function as a universal tool to assist downstream natural language processing tasks. © 2023 Science Press. All rights reserved.
引用
收藏
页码:2520 / 2534
相关论文
共 50 条
  • [1] Multi-modal Emotion Recognition Based on Speech and Image
    Li, Yongqiang
    He, Qi
    Zhao, Yongping
    Yao, Hongxun
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 844 - 853
  • [2] Lightweight multi-modal emotion recognition model based on modal generation
    Liu, Peisong
    Che, Manqiang
    Luo, Jiangchuan
    [J]. 2022 9TH INTERNATIONAL FORUM ON ELECTRICAL ENGINEERING AND AUTOMATION, IFEEA, 2022, : 430 - 435
  • [3] Multi-Modal Fusion Emotion Recognition Based on HMM and ANN
    Xu, Chao
    Cao, Tianyi
    Feng, Zhiyong
    Dong, Caichao
    [J]. CONTEMPORARY RESEARCH ON E-BUSINESS TECHNOLOGY AND STRATEGY, 2012, 332 : 541 - 550
  • [4] Multi-modal Attention for Speech Emotion Recognition
    Pan, Zexu
    Luo, Zhaojie
    Yang, Jichen
    Li, Haizhou
    [J]. INTERSPEECH 2020, 2020, : 364 - 368
  • [5] Towards Efficient Multi-Modal Emotion Recognition
    Dobrisek, Simon
    Gajsek, Rok
    Mihelic, France
    Pavesic, Nikola
    Struc, Vitomir
    [J]. INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2013, 10
  • [6] Emotion Recognition from Multi-Modal Information
    Wu, Chung-Hsien
    Lin, Jen-Chun
    Wei, Wen-Li
    Cheng, Kuan-Chun
    [J]. 2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [7] Evaluation and Discussion of Multi-modal Emotion Recognition
    Rabie, Ahmad
    Wrede, Britta
    Vogt, Thurid
    Hanheide, Marc
    [J]. SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND ELECTRICAL ENGINEERING, VOL 1, PROCEEDINGS, 2009, : 598 - +
  • [8] Reserch of Multi-modal Emotion Recognition Based on Voice and Video Images
    Wang, Chuanyu
    Li, Weixiang
    Chen, Zhenhuan
    [J]. Computer Engineering and Applications, 2024, 57 (23) : 163 - 170
  • [9] Emotion recognition based on multi-modal physiological signals and transfer learning
    Fu, Zhongzheng
    Zhang, Boning
    He, Xinrun
    Li, Yixuan
    Wang, Haoyuan
    Huang, Jian
    [J]. FRONTIERS IN NEUROSCIENCE, 2022, 16
  • [10] Multi-Modal Emotion Recognition Fusing Video and Audio
    Xu, Chao
    Du, Pufeng
    Feng, Zhiyong
    Meng, Zhaopeng
    Cao, Tianyi
    Dong, Caichao
    [J]. APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 455 - 462