Towards distributed node similarity search on graphs

被引:5
|
作者
Zhang, Tianming [1 ]
Gao, Yunjun [1 ]
Zheng, Baihua [2 ]
Chen, Lu [3 ]
Wen, Shiting [4 ]
Guo, Wei [1 ]
机构
[1] Zhejiang Univ Technol, Coll Comp Sci & Software Engn, Hangzhou, Peoples R China
[2] Singapore Management Univ, Sch Informat Syst, Singapore, Singapore
[3] Aalborg Univ, Dept Comp Sci, Aalborg, Denmark
[4] Zhejiang Univ, Ningbo Inst Technol, Ningbo, Peoples R China
基金
国家重点研发计划;
关键词
Graph; Node similarity search; Distributed processing; Algorithm;
D O I
10.1007/s11280-020-00819-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Node similarity search on graphs has wide applications in recommendation, link prediction, to name just a few. However, existing studies are insufficient due to two reasons: (i) the scale of the real-world graph is growing rapidly, and (ii) vertices are always associated with complex attributes. In this paper, we propose an efficiently distributed framework to support node similarity search on massive graphs, which considers both graph structure correlation and node attribute similarity in metric spaces. The framework consists of preprocessing stage and query stage. In the preprocessing stage, a parallel KD-tree construction (KDC) algorithm is developed to form a newly defined graph so-calledhybrid graph, in order to integrate node attribute similarity into the original graph. To equally divide graph vertices into subsets, KDC adopts the KD-tree partitioning after the pivot mapping. In addition, two metric pruning rules and an optimized allocation strategy are presented to reduce communication and computation costs. In the query stage, based on the formed hybrid graph, we develop similarity search methods using random walk with restart (RWR) to measure node similarity. To boost efficiency, we derive tight bounds to rapidly shrink the search region. Extensive experiments with three real massive graphs are conducted to verify the effectiveness, efficiency, and scalability of our proposed techniques.
引用
收藏
页码:3025 / 3053
页数:29
相关论文
共 50 条
  • [1] Towards distributed node similarity search on graphs
    Tianming Zhang
    Yunjun Gao
    Baihua Zheng
    Lu Chen
    Shiting Wen
    Wei Guo
    [J]. World Wide Web, 2020, 23 : 3025 - 3053
  • [2] Distributed Trajectory Similarity Search
    Xie, Dong
    Li, Feifei
    Phillips, Jeff M.
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (11): : 1478 - 1489
  • [3] Asymmetric Node Similarity Embedding for Directed Graphs
    Dernbach, Stefan
    Towsley, Don
    [J]. COMPLEX NETWORKS XI, 2020, : 83 - 91
  • [4] Fast Similarity Search for Graphs by Edit Distance
    D. A. Rachkovskij
    [J]. Cybernetics and Systems Analysis, 2019, 55 : 1039 - 1051
  • [5] A Similarity Search Using Molecular Topological Graphs
    Fukunishi, Yoshifumi
    Nakamura, Haruki
    [J]. JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, 2009,
  • [6] Fast Similarity Search for Graphs by Edit Distance
    Rachkovskij, D. A.
    [J]. CYBERNETICS AND SYSTEMS ANALYSIS, 2019, 55 (06) : 1039 - 1051
  • [7] Efficient Similarity Search for Sets over Graphs
    Wang, Yue
    Feng, Zonghao
    Chen, Lei
    Li, Zijian
    Jian, Xun
    Luo, Qiong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (02) : 444 - 458
  • [8] Towards Improving a Similarity Search Approach
    Shi, Yong
    [J]. PROCEEDINGS OF THE 48TH ANNUAL SOUTHEAST REGIONAL CONFERENCE (ACM SE 10), 2010, : 260 - 262
  • [9] TOWARDS A DISTRIBUTED SEARCH ENGINE
    Baeza-Yates, Ricardo
    [J]. ICEIS 2008: PROCEEDINGS OF THE TENTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL AIDSS: ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS, 2008, : IS13 - IS13
  • [10] TOWARDS A DISTRIBUTED SEARCH ENGINE
    Baeza-Yates, Ricardo
    [J]. ICEIS 2008 : PROCEEDINGS OF THE TENTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL SAIC: SOFTWARE AGENTS AND INTERNET COMPUTING, 2008, : IS13 - IS13