Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features

被引:1
|
作者
Mestrovic, Ana [1 ,2 ]
Petrovic, Milan [1 ,2 ]
Beliga, Slobodan [1 ,2 ]
机构
[1] Univ Rijeka, Fac Informat & Digital Technol, Rijeka 51000, Croatia
[2] Univ Rijeka, Ctr Artificial Intelligence & Cybersecur, Rijeka 51000, Croatia
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 21期
关键词
retweet prediction; multilayer network; natural language processing; text features; Twitter data; SPREADING PROCESSES; STRUCTURAL-ANALYSIS;
D O I
10.3390/app122111216
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Retweet prediction is an important task in the context of various problems, such as information spreading analysis, automatic fake news detection, social media monitoring, etc. In this study, we explore retweet prediction based on heterogeneous data sources. In order to classify a tweet according to the number of retweets, we combine features extracted from the multilayer network and text. More specifically, we introduce a multilayer framework for the multilayer network representation of Twitter. This formalism captures different users' actions and complex relationships, as well as other key properties of communication on Twitter. Next, we select a set of local network measures from each layer and construct a set of multilayer network features. We also adopt a BERT-based language model, namely Cro-CoV-cseBERT, to capture the high-level semantics and structure of tweets as a set of text features. We then trained six machine learning (ML) algorithms: random forest, multilayer perceptron, light gradient boosting machine, category-embedding model, neural oblivious decision ensembles, and an attentive interpretable tabular learning model for the retweet-prediction task. We compared the performance of all six algorithms in three different setups: with text features only, with multilayer network features only, and with both feature sets. We evaluated all the setups in terms of standard evaluation measures. For this task, we first prepared an empirical dataset of 199,431 tweets in Croatian posted between 1 January 2020 and 31 May 2021. Our results indicate that the prediction model performs better by integrating multilayer network features with text features than by using only one set of features.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Retweet Prediction Based on Multidimensional Features
    Fu, Xiaomeng
    Cheng, Suyan
    Zhao, Li
    Lv, Jiaguo
    [J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [2] Heterogeneous Modular Traffic Prediction Based on Multilayer Graph Convolutional Network
    Chang, Mengmeng
    Ding, Zhiming
    Zhao, Zilin
    Cai, Zhi
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (07) : 7805 - 7817
  • [3] Retweet Prediction with Attention-based Deep Neural Network
    Zhang, Qi
    Gong, Yeyun
    Wu, Jindou
    Huang, Haoran
    Huang, Xuanjing
    [J]. CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 75 - 84
  • [4] Role of twitter user profile features in retweet prediction for big data streams
    Sharma, Saurabh
    Gupta, Vishal
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (19) : 27309 - 27338
  • [5] Role of twitter user profile features in retweet prediction for big data streams
    Saurabh Sharma
    Vishal Gupta
    [J]. Multimedia Tools and Applications, 2022, 81 : 27309 - 27338
  • [6] Fuzzy time-series prediction model based on text features and network features
    Zeguang Liu
    Yao Li
    Huilin Liu
    [J]. Neural Computing and Applications, 2023, 35 : 3639 - 3649
  • [7] Fuzzy time-series prediction model based on text features and network features
    Liu, Zeguang
    Li, Yao
    Liu, Huilin
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (05): : 3639 - 3649
  • [8] Link prediction in microblog retweet network based on maximum entropy model
    Li Yong-Jun
    Yin Chao
    Yu Hui
    Liu Zun
    [J]. ACTA PHYSICA SINICA, 2016, 65 (02)
  • [9] Text as data: Using text-based features for proteins representation and for computational prediction of their characteristics
    Shatkay, Hagit
    Brady, Scott
    Wong, Andrew
    [J]. METHODS, 2015, 74 : 54 - 64
  • [10] A Drug Combination Prediction Framework Based on Graph Convolutional Network and Heterogeneous Information
    Chen, Hegang
    Lu, Yuyin
    Yang, Yuedong
    Rao, Yanghui
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (03) : 1917 - 1925