Multi-modal Emotion Recognition Based on Hypergraph

被引：0

作者：

Zong, Lin-Lin ^{[1
]}

Zhou, Jia-Hui ^{[1
]}

Xie, Qiu-Jie ^{[2
]}

Zhang, Xian-Chao ^{[1
]}

Xu, Bo ^{[3
]}

机构：

[1] Department of Software, Dalian University of Technology, Liaoning, Dalian,116000, China

[2] School of Computer Science and Technology, Fudan University, Shanghai,200433, China

[3] School of Computer Science and Technology, Dalian University of Technology, Liaoning, Dalian,116000, China

来源：

Jisuanji Xuebao/Chinese Journal of Computers | 2023年 / 46卷 / 12期

关键词：

Artificial intelligence - Character recognition - Convolution - Data handling - Graph theory - Graphic methods - Human computer interaction - Learning systems - Modal analysis - Speech recognition;

D O I：

10.11897/SP.J.1016.2023.02520

中图分类号：

学科分类号：

摘要：

With the rapid progress of artificial intelligence technology, machines need to recognize users’ emotions to provide users with a better human-computer interaction experience. Therefore, emotion recognition has become one of the active fields of artificial intelligence. Traditional emotion recognition is mostly based on text modality. Compared with single modality, multi-modal emotion recognition has the advantages of data complementarity and model robustness. In multi-modal emotion recognition, feature fusion between modalities determines the effect of emotion recognition. Recently, graph-based intra-modality fusion has attracted much attention of related research, which uses graphs of binary relationships between two modalities. When processing data of three or more modalities, the graph can hardly effectively establish the feature fusion between all modalities without introducing redundant information, limiting the performance of multi-modal emotion recognition. Therefore, it is necessary to design more effective method to model and fuse multi-modal emotion features. To solve this problem, this paper proposes an emotion recognition model Multi-modal Emotion Recognition Based on Hypergraph （MORAH） which introduces hypergraph to establish multivariate relations among multi-modal data instead of binary relations and achieves efficient multi-modal feature fusion. Specifically, the model divides multi-modal feature fusion into two stages： the hyperedge construction stage and the hypergraph learning stage. In the hyperedge construction stage, we aggregate the information of each time step in the sequence through the capsule network and establish the graph of a single modality. Then, we use graph convolution for the second aggregation, which is used as the basis for establishing hypergraph in the next stage. Benefiting from the graph capsule aggregation method, the model can work with aligned data and unaligned data at the same time, without manual alignment of unaligned data. In the hypergraph learning stage, we not only establish the association between the nodes of different modalities of the same sample but also establish the association between all modalities of the same sample. At the same time, we use hierarchical multi-level hyperedges to avoid too smooth node embedding and the simple hypergraph convolution method to fuse the high-level features between modalities, ensuring that all node features are only updated when necessary in the hypergraph convolution process. Simplified graph convolution can guarantee the effect of emotion recognition and improve the training speed without nonlinear activation and convolution filter matrix. Comprehensive experiments on two benchmark datasets show that the proposed model makes full use of the multiple relations between multi-modal data by using hypergraph. Compared with the existing advanced methods, MORAH improves the binary accuracy by 1.3% and F1-score by 1.1% on the unaligned data of the CMU-MOSI dataset. On the unaligned data of the CMU-MOSEI dataset, MORAH improves the binary accuracy and the F1-score by 0. 2%, respectively. To demonstrate the generality of the hypergraph learning stage in various multimodal tasks, we apply the hierarchical multi-level hyperedges to the emotion recognition in conversation （ERC）. The experimental results indicate that MORAH can improve the performance of ERC to a certain extent. This suggests that the MORAH model can function as a universal tool to assist downstream natural language processing tasks. © 2023 Science Press. All rights reserved.

引用

页码：2520 / 2534

共 50 条

[1] Multi-modal Emotion Recognition Based on Speech and Image
Li, Yongqiang
He, Qi
Zhao, Yongping
Yao, Hongxun
[J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 844 - 853
[2] Lightweight multi-modal emotion recognition model based on modal generation
Liu, Peisong
Che, Manqiang
Luo, Jiangchuan
[J]. 2022 9TH INTERNATIONAL FORUM ON ELECTRICAL ENGINEERING AND AUTOMATION, IFEEA, 2022, : 430 - 435
[3] Multi-Modal Fusion Emotion Recognition Based on HMM and ANN
Xu, Chao
Cao, Tianyi
Feng, Zhiyong
Dong, Caichao
[J]. CONTEMPORARY RESEARCH ON E-BUSINESS TECHNOLOGY AND STRATEGY, 2012, 332 : 541 - 550
[4] Multi-modal Attention for Speech Emotion Recognition
Pan, Zexu
Luo, Zhaojie
Yang, Jichen
Li, Haizhou
[J]. INTERSPEECH 2020, 2020, : 364 - 368
[5] Towards Efficient Multi-Modal Emotion Recognition
Dobrisek, Simon
Gajsek, Rok
Mihelic, France
Pavesic, Nikola
Struc, Vitomir
[J]. INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2013, 10
[6] Emotion Recognition from Multi-Modal Information
Wu, Chung-Hsien
Lin, Jen-Chun
Wei, Wen-Li
Cheng, Kuan-Chun
[J]. 2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
[7] Evaluation and Discussion of Multi-modal Emotion Recognition
Rabie, Ahmad
Wrede, Britta
Vogt, Thurid
Hanheide, Marc
[J]. SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND ELECTRICAL ENGINEERING, VOL 1, PROCEEDINGS, 2009, : 598 - +
[8] Reserch of Multi-modal Emotion Recognition Based on Voice and Video Images
Wang, Chuanyu
Li, Weixiang
Chen, Zhenhuan
[J]. Computer Engineering and Applications, 2024, 57 (23) : 163 - 170
[9] Emotion recognition based on multi-modal physiological signals and transfer learning
Fu, Zhongzheng
Zhang, Boning
He, Xinrun
Li, Yixuan
Wang, Haoyuan
Huang, Jian
[J]. FRONTIERS IN NEUROSCIENCE, 2022, 16
[10] Multi-Modal Emotion Recognition Fusing Video and Audio
Xu, Chao
Du, Pufeng
Feng, Zhiyong
Meng, Zhaopeng
Cao, Tianyi
Dong, Caichao
[J]. APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 455 - 462

← 1 2 3 4 5 →