Towards a standard sampling methodology on online social networks: collecting global trends on Twitter

被引:12
|
作者
Piña-García C.A. [1 ]
Gershenson C. [1 ,2 ,3 ,4 ,5 ]
Siqueiros-García J.M. [1 ]
机构
[1] Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Departamento de Ciencias de la Computación, Universidad Nacional Autónoma de México, Ciudad de México
[2] Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Circuito Maestro Mario de la Cueva S/N, Ciudad Universitaria, Ciudad de México
[3] SENSEable City Lab, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge
[4] MoBS Lab, Network Science Institute, Northeastern University, 360 Huntington av 1010-177, Boston
[5] ITMO University, Birzhevaya liniya 4, St. Petersburg
关键词
Data acquisition; Online social network; Random walks; Sampling method; Twitter;
D O I
10.1007/s41109-016-0004-1
中图分类号
学科分类号
摘要
One of the most significant current challenges in large-scale online social networks, is to establish a concise and coherent method aimed to collect and summarize data. Sampling the content of an Online Social Network (OSN) plays an important role as a knowledge discovery tool. It is becoming increasingly difficult to ignore the fact that current sampling methods must cope with a lack of a full sampling frame i.e., there is an imposed condition determined by a limited data access. In addition, another key aspect to take into account is the huge amount of data generated by users of social networking services such as Twitter, which is perhaps the most influential microblogging service producing approximately 500 million tweets per day. In this context, due to the size of Twitter, which is problematic to be measured, the analysis of the entire network is infeasible and sampling is unavoidable. In addition, we strongly believe that there is a clear need to develop a new methodology to collect information on social networks (social mining). In this regard, we think that this paper introduces a set of random strategies that could be considered as a reliable alternative to gather global trends on Twitter. It is important to note that this research pretends to show some initial ideas in how convenient are random walks to extract information or global trends. The main purpose of this study, is to propose a suitable methodology to carry out an efficient collecting process via three random strategies: Brownian, Illusion and Reservoir. These random strategies will be applied through a Metropolis-Hastings Random Walk (MHRW). We show that interesting insights can be obtained by sampling emerging global trends on Twitter. The study also offers some important insights providing descriptive statistics and graphical description from the preliminary experiments. © 2016, Piña-Garcia et al.
引用
收藏
相关论文
共 50 条
  • [1] Sampling Online Social Networks: An Experimental Study of Twitter
    Gabielkov, Maksym
    Rao, Ashwin
    Legout, Arnaud
    [J]. SIGCOMM'14: PROCEEDINGS OF THE 2014 ACM CONFERENCE ON SPECIAL INTEREST GROUP ON DATA COMMUNICATION, 2014, : 127 - 128
  • [2] Sampling Online Social Networks: An Experimental Study of Twitter
    Gabielkov, Maksym
    Rao, Ashwin
    Legout, Arnaud
    [J]. ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2014, 44 (04) : 127 - 128
  • [3] Towards Unbiased Sampling of Online Social Networks
    Wang, Dong
    Li, Zhenyu
    Xie, Gaogang
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2011,
  • [4] Sampling Online Social Networks
    Papagelis, Manos
    Das, Gautam
    Koudas, Nick
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (03) : 662 - 676
  • [5] Sampling Content from Online Social Networks: Comparing Random vs. Expert Sampling of the Twitter Stream
    Zafar, Muhammad Bilal
    Bhattacharya, Parantapa
    Ganguly, Niloy
    Gummadi, Krishna P.
    Ghosh, Saptarshi
    [J]. ACM TRANSACTIONS ON THE WEB, 2015, 9 (03)
  • [6] Multigraph Sampling of Online Social Networks
    Gjoka, Minas
    Butts, Carter T.
    Kurant, Maciej
    Markopoulou, Athina
    [J]. IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2011, 29 (09) : 1893 - 1905
  • [7] Towards Named Entity Recognition Method for Microtexts in Online Social Networks: a Case Study of Twitter
    Jung, Jason J.
    [J]. 2011 INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2011), 2011, : 563 - 564
  • [8] Understanding Online Social Networks' Users - A Twitter Approach
    Delcea, Camelia
    Cotfas, Liviu-Adrian
    Paun, Ramona
    [J]. COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, ICCCI 2014, 2014, 8733 : 145 - 153
  • [9] Sampling Online Social Networks for Analysis Purpose
    Zhou, Jiajun
    Liu, Bo
    Xiao, Zhefeng
    Chen, Yaofeng
    [J]. RECENT DEVELOPMENTS IN INTELLIGENT SYSTEMS AND INTERACTIVE APPLICATIONS (IISA2016), 2017, 541 : 245 - 249
  • [10] Noise Corrected Sampling of Online Social Networks
    Coscia, Michele
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2021, 15 (02)