An Efficient Similarity Search Framework for SimRank over Large Dynamic Graphs

被引:44
|
作者
Shao, Yingxia [1 ]
Cui, Bin [1 ]
Chen, Lei [2 ]
Liu, Mingming [1 ]
Xie, Xing [3 ]
机构
[1] Peking Univ, Sch EECS, Key Lab High Confidence Software Technol MOE, Beijing, Peoples R China
[2] HKUST, Dept Comp Sci & Engn, Hong Kong, Hong Kong, Peoples R China
[3] Microsoft Res, New York, NY USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2015年 / 8卷 / 08期
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
D O I
10.14778/2757807.2757809
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
SimRank is an important measure of vertex-pair similarity according to the structure of graphs. The similarity search based on SimRank is an important operation for identifying similar vertices in a graph and has been employed in many data analysis applications. Nowadays, graphs in the real world become much larger and more dynamic. The existing solutions for similarity search are expensive in terms of time and space cost. None of them can efficiently support similarity search over large dynamic graphs. In this paper, we propose a novel two-stage random-walk sampling framework (TSF) for SimRank-based similarity search (e.g., top-k search). the preprocessing stage, TSE samples a set of one-way graphs to index raw random walks in a novel manner within 00111,) time and space, where N is the number of vertices and is the number of one-way graphs. The one-way graph can be efficiently updated in accordance with the graph modification, thus TSF is well suited to dynamic graphs. During the query stage, TSF can search similar vertices fast by naturally pruning unqualified vertices based on the connectivity of one-way graphs. Furthermore, with additional R-q samples, TSF can estimate the SimRank score with probability 1- 2e(-2 epsilon 2 RgRq/(1 - c)2) if the error of approximation is bounded by. Finally, to guarantee the scalability of TSF, the one-way graphs can also be compactly stored on the disk when the memory is limited. Extensive experiments have demonstrated that TSF can handle dynamic billion-edge graphs with high performance.
引用
收藏
页码:838 / 849
页数:12
相关论文
共 50 条
  • [1] Efficient SimRank-based Similarity Join Over Large Graphs
    Zheng, Weiguo
    Zou, Lei
    Feng, Yansong
    Chen, Lei
    Zhao, Dongyan
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (07): : 493 - 504
  • [2] Efficient index-free SimRank similarity search in large graphs by discounting path lengths
    Zhang, Mingxi
    Yang, Liuqian
    Hu, Hangfei
    Liu, Tianxing
    Wang, Jinhua
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 206
  • [3] Efficient SimRank Tracking in Dynamic Graphs
    Wang, Yue
    Lian, Xiang
    Chen, Lei
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 545 - 556
  • [4] Efficient Similarity Search for Sets over Graphs
    Wang, Yue
    Feng, Zonghao
    Chen, Lei
    Li, Zijian
    Jian, Xun
    Luo, Qiong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (02) : 444 - 458
  • [5] Efficient and Effective Similarity Search over Bipartite Graphs
    Yang, Renchi
    PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 308 - 318
  • [6] Sig-SR: SimRank Search over Singular Graphs
    Yu, Weiren
    McCann, Julie A.
    SIGIR'14: PROCEEDINGS OF THE 37TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2014, : 859 - 862
  • [7] Scalable Similarity Search for SimRank
    Kusumoto, Mitsuru
    Maehara, Takanori
    Kawarabayashi, Ken-ichi
    SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 325 - 336
  • [8] Efficient Closest Community Search over Large Graphs
    Cai, Mingshen
    Chang, Lijun
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT II, 2020, 12113 : 569 - 587
  • [9] Efficient Subgraph Search over Large Uncertain Graphs
    Yuan, Ye
    Wang, Guoren
    Wang, Haixun
    Chen, Lei
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 4 (11): : 876 - 886
  • [10] Accelerating pairwise SimRank estimation over static and dynamic graphs
    Yue Wang
    Lei Chen
    Yulin Che
    Qiong Luo
    The VLDB Journal, 2019, 28 : 99 - 122