Role of twitter user profile features in retweet prediction for big data streams

被引:9
|
作者
Sharma, Saurabh [1 ]
Gupta, Vishal [1 ]
机构
[1] Panjab Univ, Univ Inst Engn & Technol, Chandigarh, India
关键词
Twitter; Social media analysis; Retweet prediction; User behavior; User profiling; Big data analysis; MODEL; INFORMATION; ACCOUNTS; WEB;
D O I
10.1007/s11042-022-12815-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To study the various factors influencing the process of information sharing on Twitter is a very active research area. This paper aims to explore the impact of numerical features extracted from user profiles in retweet prediction from the real-time raw feed of tweets. The originality of this work comes from the fact that the proposed model is based on simple numerical features with the least computational complexity, which is a scalable solution for big data analysis. This research work proposes three new features from the tweet author profile to capture the unique behavioral pattern of the user, namely "Author total activity", "Author total activity per year", and "Author tweets per year". The features set is tested on a dataset of 100 million random tweets collected through Twitter API. The binary labels regression gave an accuracy of 0.98 for user-profile features and gave an accuracy of 0.99 when combined with tweet content features. The regression analysis to predict the retweet count gave an R-squared value of 0.98 with combined features. The multi-label classification gave an accuracy of 0.9 for combined features and 0.89 for user-profile features. The user profile features performed better than tweet content features and performed even better when combined. This model is suitable for near real-time analysis of live streaming data coming through Twitter API and provides a baseline pattern of user behavior based on numerical features available from user profiles only.
引用
收藏
页码:27309 / 27338
页数:30
相关论文
共 46 条
  • [11] Real-Time Tweet Analytics Using Hybrid Hashtags on Twitter Big Data Streams
    Gupta, Vibhuti
    Hewett, Rattikorn
    INFORMATION, 2020, 11 (07)
  • [12] ExaAUAC: Arabic Twitter user age prediction corpus based on language and metadata features
    Sadeghi R.
    Akbari A.
    Jaziriyan M.M.
    Discover Artificial Intelligence, 2024, 4 (01):
  • [13] Role of big-data in classification and novel class detection in data streams
    Chandak M.B.
    Journal of Big Data, 3 (1)
  • [14] A Network Datagram and Big Data Based Research on Method of User Profile
    Li, Jiabin
    Xue, Zhi
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE & APPLICATION TECHNOLOGY (ICCIA 2017), 2017, 74 : 26 - 32
  • [15] Harvesting Multiple Sources for User Profile Learning: a Big Data Study
    Farseev, Aleksandr
    Nie, Liqiang
    Akbari, Mohammad
    Chua, Tat-Seng
    ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, : 235 - 242
  • [16] AUTOMATIC USER PROFILE MAPPING TO MARKETING SEGMENTS IN A BIG DATA CONTEXT
    Hoppe, Anett
    Roxin, Ana
    Nicolle, Christophe
    PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON INFORMATICS IN ECONOMY (IE 2015): EDUCATION, RESEARCH & BUSINESS TECHNOLOGIES, 2015, : 285 - 291
  • [17] Discovering User Behavioral Features to Enhance Information Search on Big Data
    Cassavia, Nunziato
    Masciari, Elio
    Pulice, Chiara
    Sacca, Domenico
    ACM TRANSACTIONS ON INTERACTIVE INTELLIGENT SYSTEMS, 2017, 7 (02)
  • [18] Big Data User Behaviour Prediction Model Incorporating Deep Learning
    Huang X.
    Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)
  • [19] An ensemble classification approach for prediction of user's next location based on Twitter data
    Kumar, Sachin
    Nezhurina, Marina, I
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2019, 10 (11) : 4503 - 4513
  • [20] An ensemble classification approach for prediction of user’s next location based on Twitter data
    Sachin Kumar
    Marina I. Nezhurina
    Journal of Ambient Intelligence and Humanized Computing, 2019, 10 : 4503 - 4513