Cross-modality representation learning from transformer for hashtag prediction

被引:0
|
作者
Mian Muhammad Yasir Khalil
Qingxian Wang
Bo Chen
Weidong Wang
机构
[1] School of Information and Software Engineering,
[2] University of Electronic Science and Technology of China,undefined
来源
关键词
Attention mechanism; Hashtag recommendation; Multimodal data; Transfer learning;
D O I
暂无
中图分类号
学科分类号
摘要
Hashtags are the keywords that describe the theme of social media content and have become very popular in influence marketing and trending topics. In recent years, hashtag prediction has become a hot topic in AI research to help users with automatic hashtag recommendations by capturing the theme of the post. Most of the previous work mainly focused only on textual information, but many microblog posts contain not only text but also the corresponding images. This work explores both image-text features of the microblog post. Inspired by the self-attention mechanism of the transformer in natural language processing, the visual-linguistics pre-train model with transfer learning also outperforms many downstream tasks that require image and text inputs. However, most of the existing models for multimodal hashtag recommendation are based on the traditional co-attention mechanism. This paper investigates the cross-modality transformer LXMERT for multimodal hashtag prediction for developing LXMERT4Hashtag, a cross-modality representation learning transformer model for hashtag prediction. It is a large-scale transformer model that consists of three encoders: a language encoder, an object encoder, and a cross-modality encoder. We evaluate the presented approach on dataset InstaNY100K. Experimental results show that our model is competitive and achieves impressive results, including precision of 50.5% vs 46.12%, recall of 44.02% vs 38.93%, and F1-score of 47.04% vs 42.22% compared to the existing state-of-the-art baseline model.
引用
收藏
相关论文
共 50 条
  • [1] Cross-modality representation learning from transformer for hashtag prediction
    Khalil, Mian Muhammad Yasir
    Wang, Qingxian
    Chen, Bo
    Wang, Weidong
    [J]. JOURNAL OF BIG DATA, 2023, 10 (01)
  • [2] Representation Learning for Cross-Modality Classification
    van Tulder, Gijs
    de Bruijne, Marleen
    [J]. MEDICAL COMPUTER VISION AND BAYESIAN AND GRAPHICAL MODELS FOR BIOMEDICAL IMAGING, 2017, 10081 : 126 - 136
  • [3] Representation Learning Through Cross-Modality Supervision
    Sankaran, Nishant
    Mohan, Deen Dayal
    Setlur, Srirangaraj
    Govindaraju, Venugopal
    Fedorishin, Dennis
    [J]. 2019 14TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2019), 2019, : 107 - 114
  • [4] Cross-modality Representation Interactive Learning For Multimodal Sentiment Analysis
    Huang, Jian
    Ji, Yanli
    Yang, Yang
    Shen, Heng Tao
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 426 - 434
  • [5] A Cross-Modality Latent Representation for the Prediction of Clinical Symptomatology in Parkinson's Disease
    Vazquez-Garcia, Cristobal
    Martinez-Murcia, F. J.
    Arco, Juan E.
    Illan, Ignacio A.
    Jimenez-Mesa, Carmen
    Ramirez, Javier
    Gorriz, Juan M.
    [J]. ARTIFICIAL INTELLIGENCE FOR NEUROSCIENCE AND EMOTIONAL SYSTEMS, PT I, IWINAC 2024, 2024, 14674 : 78 - 87
  • [6] DEEP ACTIVE LEARNING FROM MULTISPECTRAL DATA THROUGH CROSS-MODALITY PREDICTION INCONSISTENCY
    Zhang, Heng
    Fromont, Elisa
    Lefevre, Sebastien
    Avignon, Bruno
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 449 - 453
  • [7] Anatomy-Regularized Representation Learning for Cross-Modality Medical Image Segmentation
    Chen, Xu
    Lian, Chunfeng
    Wang, Li
    Deng, Hannah
    Kuang, Tianshu
    Fung, Steve
    Gateno, Jaime
    Yap, Pew-Thian
    Xia, James J.
    Shen, Dinggang
    [J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2021, 40 (01) : 274 - 285
  • [8] Liver Segmentation via Learning Cross-Modality Content-Aware Representation
    Lin, Xingxiao
    Ji, Zexuan
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XIII, 2024, 14437 : 198 - 208
  • [9] Robust video question answering via contrastive cross-modality representation learning
    Yang, Xun
    Zeng, Jianming
    Guo, Dan
    Wang, Shanshan
    Dong, Jianfeng
    Wang, Meng
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (10)
  • [10] Robust video question answering via contrastive cross-modality representation learning
    Xun YANG
    Jianming ZENG
    Dan GUO
    Shanshan WANG
    Jianfeng DONG
    Meng WANG
    [J]. Science China(Information Sciences)., 2024, 67 (10) - 226