Estimating PageRank deviations in crawled graphs

被引:2
|
作者
Holzmann, Helge [1 ]
Anand, Avishek [2 ]
Khosla, Megha [2 ]
机构
[1] Internet Arch, 300 Funston Ave, San Francisco, CA USA
[2] Leibniz Univ Hannover, Res Ctr L3S, Appelstr 9A, D-30167 Hannover, Germany
关键词
PageRank; Crawls; Ranking deviations;
D O I
10.1007/s41109-019-0201-9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Most real-world graphs collected from the Web like Web graphs and social network graphs are partially discovered or crawled. This leads to inaccurate estimates of graph properties based on link analysis such as PageRank. In this paper we focus on studying such deviations in ordering/ranking imposed by PageRank over crawled graphs. We first show that deviations in rankings induced by PageRank are indeed possible. We measure how much a ranking, induced by PageRank, on an input graph could deviate from the original unseen graph. More importantly, we are interested in conceiving a measure that approximates the rank correlation among them without any knowledge of the original graph. To this extent we formulate the HAK measure that is based on computing the impact redistribution of PageRank according to the local graph structure. We further propose an algorithm that identifies connected subgraphs over the input graph for which the relative ordering is preserved. Finally, we perform extensive experiments on both real-world Web and social network graphs with more than 100M vertices and 10B edges as well as synthetic graphs to showcase the utility of HAK and our High-fidelity Component Selection approach.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Estimating PageRank deviations in crawled graphs
    Helge Holzmann
    Avishek Anand
    Megha Khosla
    Applied Network Science, 4
  • [2] PAGERANK AND RANDOM WALKS ON GRAPHS
    Chung, Fan
    Zhao, Wenbo
    FETE OF COMBINATORICS AND COMPUTER SCIENCE, 2010, 20 : 43 - 62
  • [3] PageRank in Evolving Tree Graphs
    Abola, Benard
    Biganda, Pitos Seleka
    Engstrom, Christopher
    Mango, John Magero
    Kakuba, Godwin
    Silvestrov, Sergei
    STOCHASTIC PROCESSES AND APPLICATIONS (SPAS2017), 2018, 271 : 375 - 390
  • [4] On the edges' PageRank and line graphs
    Criado, Regino
    Moral, Santiago
    Perez, Angel
    Romance, Miguel
    CHAOS, 2018, 28 (07)
  • [5] Updating PageRank for Streaming Graphs
    Riedy, Jason
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 877 - 884
  • [6] PageRank in Undirected Random Graphs
    Avrachenkov, Konstantin
    Kadavankandy, Arun
    Prokhorenkova, Liudmila Ostroumova
    Raigorodskii, Andrei
    ALGORITHMS AND MODELS FOR THE WEB GRAPH, (WAW 2015), 2015, 9479 : 151 - 163
  • [7] A note on the PageRank of undirected graphs
    Grolmusz, Vince
    INFORMATION PROCESSING LETTERS, 2015, 115 (6-8) : 633 - 634
  • [8] Delusive PageRank in Incomplete Graphs
    Holzmann, Helge
    Anand, Avishek
    Khosla, Megha
    COMPLEX NETWORKS AND THEIR APPLICATIONS VII, VOL 1, 2019, 812 : 104 - 117
  • [9] Estimating PageRank on Graph Streams
    Das Sarma, Atish
    Gollapudi, Sreenivas
    Panigrahy, Rina
    JOURNAL OF THE ACM, 2011, 58 (03)
  • [10] DF* PageRank: Incrementally Expanding Approaches for Updating PageRank on Dynamic Graphs
    Sahu, Subhajit
    Kothapalli, Kishore
    Eedi, Hemalatha
    Peri, Sathya
    EURO-PAR 2024: PARALLEL PROCESSING, PT III, EURO-PAR 2024, 2024, 14803 : 312 - 326