Towards distributed node similarity search on graphs

被引:0
|
作者
Tianming Zhang
Yunjun Gao
Baihua Zheng
Lu Chen
Shiting Wen
Wei Guo
机构
[1] Zhejiang University of Technology,College of Computer Science and Software Engineering
[2] Singapore Management University,School of Information Systems
[3] Aalborg University,Department of Computer Science
[4] Zhejiang University,The Ningbo Institute of Technology
来源
World Wide Web | 2020年 / 23卷
关键词
Graph; Node similarity search; Distributed processing; Algorithm;
D O I
暂无
中图分类号
学科分类号
摘要
Node similarity search on graphs has wide applications in recommendation, link prediction, to name just a few. However, existing studies are insufficient due to two reasons: (i) the scale of the real-world graph is growing rapidly, and (ii) vertices are always associated with complex attributes. In this paper, we propose an efficiently distributed framework to support node similarity search on massive graphs, which considers both graph structure correlation and node attribute similarity in metric spaces. The framework consists of preprocessing stage and query stage. In the preprocessing stage, a parallel KD-tree construction (KDC) algorithm is developed to form a newly defined graph so-called hybrid graph, in order to integrate node attribute similarity into the original graph. To equally divide graph vertices into subsets, KDC adopts the KD-tree partitioning after the pivot mapping. In addition, two metric pruning rules and an optimized allocation strategy are presented to reduce communication and computation costs. In the query stage, based on the formed hybrid graph, we develop similarity search methods using random walk with restart (RWR) to measure node similarity. To boost efficiency, we derive tight bounds to rapidly shrink the search region. Extensive experiments with three real massive graphs are conducted to verify the effectiveness, efficiency, and scalability of our proposed techniques.
引用
收藏
页码:3025 / 3053
页数:28
相关论文
共 50 条
  • [21] Similarity measures for hierarchical representations of graphs with unique node labels
    Dickinson, PJ
    Kraetzl, M
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2004, 18 (03) : 425 - 442
  • [22] De-anonymizing Social Graphs via Node Similarity
    Fu, Hao
    Zhang, Aston
    Xie, Xing
    WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 263 - 264
  • [23] Distributed Computation of Node and Edge Betweenness on Tree Graphs
    Wang, Wei
    Tang, Choon Yik
    2013 IEEE 52ND ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2013, : 43 - 48
  • [24] Distributed Online Similarity Search in High Dimensional Space
    Li, Baohui
    Xu, Kefu
    Xie, Hongtao
    2014 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2014, : 204 - +
  • [25] DIMS: Distributed Index for Similarity Search in Metric Spaces
    Zhu, Yifan
    Luo, Chengyang
    Qian, Tang
    Chen, Lu
    Gao, Yunjun
    Zheng, Baihua
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (01) : 210 - 225
  • [26] Towards Distributed Bitruss Decomposition on Bipartite Graphs
    Wang, Yue
    Xu, Ruiqi
    Jian, Xun
    Zhou, Alexander
    Chen, Lei
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (09): : 1889 - 1901
  • [27] Towards Distributed Square Counting in Large Graphs
    Steil, Trevor
    Sanders, Geoffrey
    Pearce, Roger
    2021 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2021,
  • [28] Similarity Search Over Graphs Using Localized Spectral Analysis
    Aizenbud, Yariv
    Averbuch, Amir
    Shabat, Gil
    Ziv, Guy
    2017 INTERNATIONAL CONFERENCE ON SAMPLING THEORY AND APPLICATIONS (SAMPTA), 2017, : 635 - 638
  • [29] Semantic SPARQL Similarity Search Over RDF Knowledge Graphs
    Zheng, Weiguo
    Zou, Lei
    Peng, Wei
    Yan, Xifeng
    Song, Shaoxu
    Zhao, Dongyan
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (11): : 840 - 851
  • [30] Practical algorithms and lower bounds for similarity search in massive graphs
    Fogaras, Daniel
    Racz, Balazs
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (05) : 585 - 598