Attention-based multi-modal fusion sarcasm detection

被引：1

作者：

Liu, Jing ^{[1
]}

Tian, Shengwei ^{[1
]}

Yu, Long ^{[2
]}

Long, Jun ^{[3
,4
]}

Zhou, Tiejun ^{[5
]}

Wang, Bo ^{[1
]}

机构：

[1] Xinjiang Univ, Sch Software, Urumqi, Xinjiang, Peoples R China

[2] Xinjiang Univ, Network & Informat Ctr, Urumqi, Xinjiang, Peoples R China

[3] Cent South Univ, Sch Informat Sci & Engn, Changsha, Peoples R China

[4] Cent South Univ, Big Data & Knowledge Engn Inst, Changsha, Peoples R China

[5] Xinjiang Internet Informat Ctr, Urumqi, Xinjiang, Peoples R China

来源：

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS | 2023年 / 44卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Multi-modal; sarcasm detection; Attention; ViT; D-BiGRU;

D O I：

10.3233/JIFS-213501

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Sarcasm is a way to express the thoughts of a person. The intended meaning of the ideas expressed through sarcasm is often the opposite of the apparent meaning. Previous work on sarcasm detection mainly focused on the text. But nowadays most information is multi-modal, including text and images. Therefore, the task of targeting multi-modal sarcasm detection is becoming an increasingly hot research topic. In order to better detect the accurate meaning of multi-modal sarcasm information, this paper proposed a multi-modal fusion sarcasm detection model based on the attention mechanism, which introduced Vision Transformer (ViT) to extract image features and designed a Double-Layer Bi-Directional Gated Recurrent Unit (D-BiGRU) to extract text features. The features of the two modalities are fused into one feature vector and predicted after attention enhancement. The model presented in this paper gained significant experimental results on the baseline datasets, which are 0.71% and 0.38% higher than that of the best baseline model proposed on F1-score and accuracy respectively.

引用

页码：2097 / 2108

页数：12

共 50 条

[1] Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion
Li, Siqi
Zou, Changqing
Li, Yipeng
Zhao, Xibin
Gao, Yue
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11402 - 11409
[2] ARF-Net: a multi-modal aesthetic attention-based fusion
Iffath, Fariha
Gavrilova, Marina
VISUAL COMPUTER, 2024, 40 (07): : 4941 - 4953
[3] Multi-modal sarcasm detection based on Multi-Channel Enhanced Fusion model
Fang, Hong
Liang, Dahao
Xiang, Weiyu
NEUROCOMPUTING, 2024, 578
[4] Multi-Modal Sarcasm Detection in Twitter with Hierarchical Fusion Model
Cai, Yitao
Cai, Huiyu
Wan, Xiaojun
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2506 - 2515
[5] A multi-modal sarcasm detection model based on cue learning
Lu, Ming
Dong, Zhiqiang
Guo, Ziming
Zhang, Xiaoming
Lu, Xinxi
Wang, Tianbo
Zhang, Litian
SCIENTIFIC REPORTS, 2025, 15 (01):
[6] An attention-based multi-modal MRI fusion model for major depressive disorder diagnosis
Zheng, Guowei
Zheng, Weihao
Zhang, Yu
Wang, Junyu
Chen, Miao
Wang, Yin
Cai, Tianhong
Yao, Zhijun
Hu, Bin
JOURNAL OF NEURAL ENGINEERING, 2023, 20 (06)
[7] Multi-Modal Sarcasm Detection Based on Dual Generative Processes
Ma, Huiying
He, Dongxiao
Wang, Xiaobao
Jin, Di
Ge, Meng
Wang, Longbiao
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 2279 - 2287
[8] Single-Stage Extensive Semantic Fusion for multi-modal sarcasm detection
Fang, Hong
Liang, Dahao
Xiang, Weiyu
ARRAY, 2024, 22
[9] Attention-Based Multi-Modal Multi-View Fusion Approach for Driver Facial Expression Recognition
Chen, Jianrong
Dey, Sujit
Wang, Lei
Bi, Ning
Liu, Peng
IEEE ACCESS, 2024, 12 : 137203 - 137221
[10] MULTI-MODAL HIERARCHICAL ATTENTION-BASED DENSE VIDEO CAPTIONING
Munusamy, Hemalatha
Sekhar, Chandra C.
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 475 - 479

← 1 2 3 4 5 →