Cross-modality representation learning from transformer for hashtag prediction

被引：0

作者：

Mian Muhammad Yasir Khalil

Qingxian Wang

Bo Chen

Weidong Wang

机构：

[1] School of Information and Software Engineering,

[2] University of Electronic Science and Technology of China,undefined

来源：

Journal of Big Data | / 10卷

关键词：

Attention mechanism; Hashtag recommendation; Multimodal data; Transfer learning;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Hashtags are the keywords that describe the theme of social media content and have become very popular in influence marketing and trending topics. In recent years, hashtag prediction has become a hot topic in AI research to help users with automatic hashtag recommendations by capturing the theme of the post. Most of the previous work mainly focused only on textual information, but many microblog posts contain not only text but also the corresponding images. This work explores both image-text features of the microblog post. Inspired by the self-attention mechanism of the transformer in natural language processing, the visual-linguistics pre-train model with transfer learning also outperforms many downstream tasks that require image and text inputs. However, most of the existing models for multimodal hashtag recommendation are based on the traditional co-attention mechanism. This paper investigates the cross-modality transformer LXMERT for multimodal hashtag prediction for developing LXMERT4Hashtag, a cross-modality representation learning transformer model for hashtag prediction. It is a large-scale transformer model that consists of three encoders: a language encoder, an object encoder, and a cross-modality encoder. We evaluate the presented approach on dataset InstaNY100K. Experimental results show that our model is competitive and achieves impressive results, including precision of 50.5% vs 46.12%, recall of 44.02% vs 38.93%, and F1-score of 47.04% vs 42.22% compared to the existing state-of-the-art baseline model.

引用

共 50 条

[1] Cross-modality representation learning from transformer for hashtag prediction
Khalil, Mian Muhammad Yasir
Wang, Qingxian
Chen, Bo
Wang, Weidong
[J]. JOURNAL OF BIG DATA, 2023, 10 (01)
[2] Representation Learning for Cross-Modality Classification
van Tulder, Gijs
de Bruijne, Marleen
[J]. MEDICAL COMPUTER VISION AND BAYESIAN AND GRAPHICAL MODELS FOR BIOMEDICAL IMAGING, 2017, 10081 : 126 - 136
[3] Representation Learning Through Cross-Modality Supervision
Sankaran, Nishant
Mohan, Deen Dayal
Setlur, Srirangaraj
Govindaraju, Venugopal
Fedorishin, Dennis
[J]. 2019 14TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2019), 2019, : 107 - 114
[4] Cross-modality Representation Interactive Learning For Multimodal Sentiment Analysis
Huang, Jian
Ji, Yanli
Yang, Yang
Shen, Heng Tao
[J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 426 - 434
[5] A Cross-Modality Latent Representation for the Prediction of Clinical Symptomatology in Parkinson's Disease
Vazquez-Garcia, Cristobal
Martinez-Murcia, F. J.
Arco, Juan E.
Illan, Ignacio A.
Jimenez-Mesa, Carmen
Ramirez, Javier
Gorriz, Juan M.
[J]. ARTIFICIAL INTELLIGENCE FOR NEUROSCIENCE AND EMOTIONAL SYSTEMS, PT I, IWINAC 2024, 2024, 14674 : 78 - 87
[6] DEEP ACTIVE LEARNING FROM MULTISPECTRAL DATA THROUGH CROSS-MODALITY PREDICTION INCONSISTENCY
Zhang, Heng
Fromont, Elisa
Lefevre, Sebastien
Avignon, Bruno
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 449 - 453
[7] Anatomy-Regularized Representation Learning for Cross-Modality Medical Image Segmentation
Chen, Xu
Lian, Chunfeng
Wang, Li
Deng, Hannah
Kuang, Tianshu
Fung, Steve
Gateno, Jaime
Yap, Pew-Thian
Xia, James J.
Shen, Dinggang
[J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2021, 40 (01) : 274 - 285
[8] Liver Segmentation via Learning Cross-Modality Content-Aware Representation
Lin, Xingxiao
Ji, Zexuan
[J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XIII, 2024, 14437 : 198 - 208
[9] Robust video question answering via contrastive cross-modality representation learning
Yang, Xun
Zeng, Jianming
Guo, Dan
Wang, Shanshan
Dong, Jianfeng
Wang, Meng
[J]. SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (10)
[10] Robust video question answering via contrastive cross-modality representation learning
Xun YANG
Jianming ZENG
Dan GUO
Shanshan WANG
Jianfeng DONG
Meng WANG
[J]. Science China(Information Sciences)., 2024, 67 (10) - 226

← 1 2 3 4 5 →