Leveraging History for Faster Sampling of Online Social Networks

被引:13
|
作者
Zhou, Zhuojie [1 ]
Zhang, Nan [1 ]
Das, Gautam [2 ]
机构
[1] George Washington Univ, Washington, DC 20052 USA
[2] Univ Texas Arlington, Arlington, TX USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2015年 / 8卷 / 10期
基金
美国国家科学基金会;
关键词
D O I
10.14778/2794367.2794373
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With a vast amount of data available on online social networks, how to enable efficient analytics over such data has been an increasingly important research problem. Given the sheer size of such social networks, many existing studies resort to sampling techniques that draw random nodes from an online social network through its restrictive web/API interface. While these studies differ widely in analytics tasks supported and algorithmic design, almost all of them use the exact same underlying technique of random walk - a Markov Chain Monte Carlo based method which iteratively transits from one node to its random neighbor. Random walk fits naturally with this problem because, for most online social networks, the only query we can issue through the interface is to retrieve the neighbors of a given node (i.e., no access to the full graph topology). A problem with random walks, however, is the "burn-in" period which requires a large number of transitions/queries before the sampling distribution converges to a stationary value that enables the drawing of samples in a statistically valid manner. In this paper, we consider a novel problem of speeding up the fundamental design of random walks (i.e., reducing the number of queries it requires) without changing the stationary distribution it achieves - thereby enabling a more efficient "drop-in" replacement for existing sampling-based analytics techniques over online social networks. Technically, our main idea is to leverage the history of random walks to construct a higher-ordered Markov chain. We develop two algorithms, Circulated Neighbors and Groupby Neighbors Random Walk (CNRW and GNRW) and rigidly prove that, no matter what the social network topology is, CNRW and GNRW offer better efficiency than baseline random walks while achieving the same stationary distribution. We demonstrate through extensive experiments on real-world social networks and synthetic graphs the superiority of our techniques over the existing ones.
引用
收藏
页码:1034 / 1045
页数:12
相关论文
共 50 条
  • [21] Real time enhanced random sampling of online social networks
    Haralabopoulos, Giannis
    Anagnostopoulos, Ioannis
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2014, 41 : 126 - 134
  • [22] Resource Efficient Algorithms for Message Sampling in Online Social Networks
    Burchard, Luk
    Schroeder, Daniel Thilo
    Becker, Soeren
    Langguth, Johannes
    [J]. 2020 SEVENTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORK ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2020, : 27 - 34
  • [23] Sampling Online Social Networks Using Coupling From The Past
    White, Kenton
    Li, Guichong
    Japkowicz, Nathalie
    [J]. 12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 266 - 272
  • [24] Challenging the Limits: Sampling Online Social Networks with Cost Constraints
    Xu, Xin
    Lee, Chul-Ho
    Eun, Do Young
    [J]. IEEE INFOCOM 2017 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2017,
  • [25] Sampling online social networks by random walk with indirect jumps
    Zhao, Junzhou
    Wang, Pinghui
    Lui, John C. S.
    Towsley, Don
    Guan, Xiaohong
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2019, 33 (01) : 24 - 57
  • [26] A Parallel Neural Network Approach for Faster Rumor Identification in Online Social Networks
    Srinivasan, Santhoshkumar
    Babu, Dhinesh L. D.
    [J]. INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2019, 15 (04) : 69 - 89
  • [27] Leveraging Online Social Networks For a Real-time Malware Alerting System
    Al-Qasem, Isra'
    Al-Qasem, Sumaya
    Al-Hammouri, Ahmad T.
    [J]. PROCEEDINGS OF THE 2013 38TH ANNUAL IEEE CONFERENCE ON LOCAL COMPUTER NETWORKS (LCN 2013), 2013, : 272 - 275
  • [28] Toward spreadable entertainment-education: leveraging social influence in online networks
    Lutkenhaus, Roel O.
    Jansz, Jeroen
    Bouman, Martine P. A.
    [J]. HEALTH PROMOTION INTERNATIONAL, 2020, 35 (05) : 1241 - 1250
  • [29] Leveraging Social Networks for Toxicovigilance
    Chary, Michael
    Genes, Nicholas
    McKenzie, Andrew
    Manini, Alex F.
    [J]. JOURNAL OF MEDICAL TOXICOLOGY, 2013, 9 (02) : 184 - 191
  • [30] Leveraging the power of social networks
    不详
    [J]. JOURNAL OF ENVIRONMENTAL MONITORING, 2010, 12 (09): : 1649 - 1649