Estimating PageRank deviations in crawled graphs

被引:2
|
作者
Holzmann, Helge [1 ]
Anand, Avishek [2 ]
Khosla, Megha [2 ]
机构
[1] Internet Arch, 300 Funston Ave, San Francisco, CA USA
[2] Leibniz Univ Hannover, Res Ctr L3S, Appelstr 9A, D-30167 Hannover, Germany
关键词
PageRank; Crawls; Ranking deviations;
D O I
10.1007/s41109-019-0201-9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Most real-world graphs collected from the Web like Web graphs and social network graphs are partially discovered or crawled. This leads to inaccurate estimates of graph properties based on link analysis such as PageRank. In this paper we focus on studying such deviations in ordering/ranking imposed by PageRank over crawled graphs. We first show that deviations in rankings induced by PageRank are indeed possible. We measure how much a ranking, induced by PageRank, on an input graph could deviate from the original unseen graph. More importantly, we are interested in conceiving a measure that approximates the rank correlation among them without any knowledge of the original graph. To this extent we formulate the HAK measure that is based on computing the impact redistribution of PageRank according to the local graph structure. We further propose an algorithm that identifies connected subgraphs over the input graph for which the relative ordering is preserved. Finally, we perform extensive experiments on both real-world Web and social network graphs with more than 100M vertices and 10B edges as well as synthetic graphs to showcase the utility of HAK and our High-fidelity Component Selection approach.
引用
收藏
页数:22
相关论文
共 50 条
  • [31] Distributed Algorithms for Fully Personalized PageRank on Large Graphs
    Lin, Wenqing
    WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 1084 - 1094
  • [32] Image tagging using PageRank over bipartite graphs
    Bauckhage, Christian
    PATTERN RECOGNITION, 2008, 5096 : 426 - 435
  • [33] Lock-free Computation of PageRank in Dynamic Graphs
    Sahu, Subhajit
    Kothapalli, Kishore
    Eedi, Hemalatha
    Peri, Sathya
    2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 825 - 834
  • [34] LARGE DEVIATIONS FOR HEAT KERNELS ON GRAPHS
    DAVIES, EB
    JOURNAL OF THE LONDON MATHEMATICAL SOCIETY-SECOND SERIES, 1993, 47 : 65 - 72
  • [35] Large deviations of cascade processes on graphs
    Altarelli, F.
    Braunstein, A.
    Dall'Asta, L.
    Zecchina, R.
    PHYSICAL REVIEW E, 2013, 87 (06):
  • [36] Large Deviations for Dense Random Graphs
    Chatterjee, Sourav
    LARGE DEVIATIONS FOR RANDOM GRAPHS: ECOLE D'ETE DE PROBABILITES DE SAINT-FLOUR XLV - 2015, 2017, 2197 : 53 - 70
  • [37] AN INTRODUCTION TO LARGE DEVIATIONS FOR RANDOM GRAPHS
    Chatterjee, Sourav
    BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 2016, 53 (04) : 617 - 642
  • [38] The large deviations of estimating rate functions
    Duffy, K
    Metcalfe, AP
    JOURNAL OF APPLIED PROBABILITY, 2005, 42 (01) : 267 - 274
  • [39] Scaling the PageRank Algorithm for Very Large Graphs on the Fugaku Supercomputer
    Vandromme, Maxence
    Gurhem, Jerome
    Tsuji, Miwako
    Petiton, Serge
    Sato, Mitsuhisa
    COMPUTATIONAL SCIENCE - ICCS 2022, PT I, 2022, : 389 - 402
  • [40] Estimating Propensity Parameters Using Google PageRank and Genetic Algorithms
    Murrugarra, David
    Miller, Jacob
    Mueller, Alex N.
    FRONTIERS IN NEUROSCIENCE, 2016, 10