Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features

被引:1
|
作者
Mestrovic, Ana [1 ,2 ]
Petrovic, Milan [1 ,2 ]
Beliga, Slobodan [1 ,2 ]
机构
[1] Univ Rijeka, Fac Informat & Digital Technol, Rijeka 51000, Croatia
[2] Univ Rijeka, Ctr Artificial Intelligence & Cybersecur, Rijeka 51000, Croatia
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 21期
关键词
retweet prediction; multilayer network; natural language processing; text features; Twitter data; SPREADING PROCESSES; STRUCTURAL-ANALYSIS;
D O I
10.3390/app122111216
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Retweet prediction is an important task in the context of various problems, such as information spreading analysis, automatic fake news detection, social media monitoring, etc. In this study, we explore retweet prediction based on heterogeneous data sources. In order to classify a tweet according to the number of retweets, we combine features extracted from the multilayer network and text. More specifically, we introduce a multilayer framework for the multilayer network representation of Twitter. This formalism captures different users' actions and complex relationships, as well as other key properties of communication on Twitter. Next, we select a set of local network measures from each layer and construct a set of multilayer network features. We also adopt a BERT-based language model, namely Cro-CoV-cseBERT, to capture the high-level semantics and structure of tweets as a set of text features. We then trained six machine learning (ML) algorithms: random forest, multilayer perceptron, light gradient boosting machine, category-embedding model, neural oblivious decision ensembles, and an attentive interpretable tabular learning model for the retweet-prediction task. We compared the performance of all six algorithms in three different setups: with text features only, with multilayer network features only, and with both feature sets. We evaluated all the setups in terms of standard evaluation measures. For this task, we first prepared an empirical dataset of 199,431 tweets in Croatian posted between 1 January 2020 and 31 May 2021. Our results indicate that the prediction model performs better by integrating multilayer network features with text features than by using only one set of features.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] Defect Prediction Based on The Characteristics of Multilayer Structure of Software Network
    Yang, Yiwen
    Ai, Jun
    Wang, Fei
    [J]. 2018 IEEE 18TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY COMPANION (QRS-C), 2018, : 27 - 34
  • [42] Text as data: Using text-based features for proteins representation and for computational prediction of their characteristics (vol 74, pg 54, 2015)
    Shatkay, Hagit
    Brady, Scott
    Wong, Andrew
    [J]. METHODS, 2016, 104 : 208 - 208
  • [43] Correction: A deep learning approach to text-based personality prediction using multiple data sources mapping
    Joshua Johnson Sirasapalli
    Ramakrishna Murty Malla
    [J]. Neural Computing and Applications, 2024, 36 (19) : 11659 - 11659
  • [44] Special invited paper: The SCORE normalization, especially for heterogeneous network and text data
    Ke, Zheng Tracy
    Jin, Jiashun
    [J]. STAT, 2023, 12 (01):
  • [45] Heat-supply network state prediction based on optimum combination model of data mining
    Wang, Xiufang
    Wang, Yan
    Bi, Hongbo
    Gao, Running
    [J]. Journal of Applied Sciences, 2013, 13 (13) : 2443 - 2449
  • [46] VisCrimePredict: A System for Crime Trajectory Prediction and Visualisation from Heterogeneous data sources
    Morshed, Ahsan
    Forkan, Abdur Rahim Mohammad
    Tsai, Pei-Wei
    Jayaraman, Prem Prakash
    Sellis, Timos
    Georgakopoulos, Dimitrios
    Moser, Irene
    Ranjan, Rajiv
    [J]. SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 1099 - 1106
  • [47] Web page classification based on heterogeneous features and a combination of multiple classifiers
    Deng, Li
    Du, Xin
    Shen, Ji-zhong
    [J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2020, 21 (07) : 995 - 1004
  • [48] Web page classification based on heterogeneous features and a combination of multiple classifiers
    Li Deng
    Xin Du
    Ji-zhong Shen
    [J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21 : 995 - 1004
  • [49] Combination of Stylo-based Features and Frequency-based Features for Identifying the Author of Short Arabic Text
    Al-Sarem, Mohammed
    Cherif, Walid
    Wahab, Ahmed Abdel
    Emara, Abdel-Hamid
    Kissi, Mohamed
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS: THEORIES AND APPLICATIONS (SITA'18), 2018,
  • [50] Prediction of Accident and Accident Severity Based on Heterogeneous Data
    Kandacharam, Sneha
    Rajathilagam, B.
    [J]. DISTRIBUTED COMPUTING AND INTELLIGENT TECHNOLOGY, ICDCIT 2023, 2023, 13776 : 369 - 374