Multi-modal and Multi-label Emotion Detection for Comics Based on Two-Stream Network

被引:0
|
作者
Lin Z. [1 ]
Zeng B. [1 ]
Pan Z. [1 ]
Wen S. [1 ]
机构
[1] School of Computers, Guangdong University of Technology, Guangzhou
来源
Zeng, Bi (zb9215@gdut.edu.cn) | 1600年 / Science Press卷 / 34期
基金
中国国家自然科学基金;
关键词
Comic Emotion Detection; Cosine Similarity; Multi-head Self-Attention Mechanism; Multi-modal Fusion;
D O I
10.16451/j.cnki.issn1003-6059.202111005
中图分类号
学科分类号
摘要
Comic is widely applied for metaphorizing social phenomena and expressing emotion in social media. To solve the problem of label ambiguity in multi-modal and multi-label emotion detection of comic scenes, a multi-modal and multi-label emotion detection model for comics based on two-stream network is proposed. The inter-modal information is compared using cosine similarity and combined with a self-attention mechanism to merge image features and text features. Then, the backbone of the method is a two-stream structure taking the Transformer model as the image backbone network to extract image features and taking the Roberta pre-training model as the text backbone network to extract text features. The improved cosine similarity is combined with cosine self-attention mechanism and multi-head self-attention mechanism(COS-MHSA) to extract the high-level features of the image. Finally, the multi-modal features of the high-level features and COS-MHSA are fused. The effectiveness of the proposed method is verified on EmoRecCom dataset, and the emotion detection result is presented in a visual manner. © 2021, Science Press. All right reserved.
引用
收藏
页码:1017 / 1027
页数:10
相关论文
共 33 条
  • [1] AUGEREAU O, IWATA M, KISE K., A Survey of Comics Research in Computer Science
  • [2] ZHANG C, YANG Z C, HE X D, Et al., Multimodal Intelligence: Representation Learning, Information Fusion, and Applications, IEEE Journal of Selected Topics in Signal Processing, 14, 3, pp. 478-493, (2020)
  • [3] ZHANG Y Z, RONG L, SONG D W, Et al., A Survey on Multimodal Sentiment Analysis, Pattern Recognition and Artificial Intelligence, 33, 5, pp. 426-438, (2020)
  • [4] ZHANG J H, YIN Z, CHEN P, Et al., Emotion Recognition Using Multi-modal Data and Machine Learning Techniques: A Tutorial and Review, Information Fusion, 59, pp. 103-126, (2020)
  • [5] TZIRAKIS P, TRIGEORGIS G, NICOLAOU M A, Et al., End-to-End Multimodal Emotion Recognition Using Deep Neural Networks, IEEE Journal of Selected Topics in Signal Processing, 11, 8, pp. 1301-1309, (2017)
  • [6] TSAI Y H, BAI S J, LIANG P P, Et al., Multimodal Transformer for Unaligned Multimodal Language Sequences, Proc of the 57th An-nual Meeting of the Association for Computational Linguistics, pp. 6558-6569, (2019)
  • [7] CIMTAY Y, EKMEKCIOGLU E., Investigating the Use of Pretrained Convolutional Neural Network on Cross-Subject and Cross-Dataset EEG Emotion Recognition, Sensors, 20, 7, (2020)
  • [8] LIU C X, ZOPH B, NEUMANN M, Et al., Progressive Neural Ar-chitecture Search
  • [9] ANASTASOPOULOS A, KUMAR S, LIAO H., Neural Language Mo-deling with Visual Features
  • [10] ZOPH B, LE Q V., Neural Architecture Search with Reinforcement Learning