Role of twitter user profile features in retweet prediction for big data streams

被引:0
|
作者
Saurabh Sharma
Vishal Gupta
机构
[1] University Institute of Engineering and Technology,
[2] Panjab University,undefined
来源
关键词
Twitter; Social media analysis; Retweet prediction; User behavior; User profiling; Big data analysis;
D O I
暂无
中图分类号
学科分类号
摘要
To study the various factors influencing the process of information sharing on Twitter is a very active research area. This paper aims to explore the impact of numerical features extracted from user profiles in retweet prediction from the real-time raw feed of tweets. The originality of this work comes from the fact that the proposed model is based on simple numerical features with the least computational complexity, which is a scalable solution for big data analysis. This research work proposes three new features from the tweet author profile to capture the unique behavioral pattern of the user, namely “Author total activity”, “Author total activity per year”, and “Author tweets per year”. The features set is tested on a dataset of 100 million random tweets collected through Twitter API. The binary labels regression gave an accuracy of 0.98 for user-profile features and gave an accuracy of 0.99 when combined with tweet content features. The regression analysis to predict the retweet count gave an R-squared value of 0.98 with combined features. The multi-label classification gave an accuracy of 0.9 for combined features and 0.89 for user-profile features. The user profile features performed better than tweet content features and performed even better when combined. This model is suitable for near real-time analysis of live streaming data coming through Twitter API and provides a baseline pattern of user behavior based on numerical features available from user profiles only.
引用
收藏
页码:27309 / 27338
页数:29
相关论文
共 46 条