Efficient structural node similarity computation on billion-scale graphs

被引:0
|
作者
Xiaoshuang Chen
Longbin Lai
Lu Qin
Xuemin Lin
机构
[1] University of New South Wales,Centre for AI
[2] Alibaba Group,undefined
[3] University of Technology Sydney,undefined
[4] East China Normal University,undefined
来源
The VLDB Journal | 2021年 / 30卷
关键词
Node similarity; Role similarity; Efficiency; Link analysis;
D O I
暂无
中图分类号
学科分类号
摘要
Structural node similarity is widely used in analyzing complex networks. As one of the structural node similarity metrics, role similarity has the good merit of indicating automorphism (isomorphism). Existing algorithms to compute role similarity (e.g., RoleSim and NED) suffer from severe performance bottlenecks and thus cannot handle large real-world graphs. In this paper, we propose a new framework, namely StructSim, to compute nodes’ role similarity. Under this framework, we first prove that StructSim is an admissible role similarity metric based on the maximum matching. While the maximum matching is still too costly to scale, we then devise the BinCount matching that not only is efficient to compute but also guarantees the admissibility of StructSim. BinCount-based StructSim admits a precomputed index to query a single pair of node in O(klogD)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(k\log D)$$\end{document} time, where k is a small user-defined parameter and D is the maximum node degree. To build the index, we further devise an FM-sketch-based technique that can handle graphs with billions of edges. Extensive empirical studies show that StructSim performs much better than the existing works regarding both effectiveness and efficiency when applied to compute structural node similarities on the real-world graphs.
引用
收藏
页码:471 / 493
页数:22
相关论文
共 50 条
  • [1] Efficient structural node similarity computation on billion-scale graphs
    Chen, Xiaoshuang
    Lai, Longbin
    Qin, Lu
    Lin, Xuemin
    [J]. VLDB JOURNAL, 2021, 30 (03): : 471 - 493
  • [2] Efficient Triangle Listing for Billion-Scale Graphs
    Zhang, Hao
    Zhu, Yuanyuan
    Qin, Lu
    Cheng, Hong
    Yu, Jeffrey Xu
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 813 - 822
  • [3] Billion-Scale Similarity Search with GPUs
    Johnson, Jeff
    Douze, Matthijs
    Jegou, Herve
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2021, 7 (03) : 535 - 547
  • [4] Efficient MapReduce algorithms for triangle listing in billion-scale graphs
    Zhu, Yuanyuan
    Zhang, Hao
    Qin, Lu
    Cheng, Hong
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 2017, 35 (02) : 149 - 176
  • [5] Efficient MapReduce algorithms for triangle listing in billion-scale graphs
    Yuanyuan Zhu
    Hao Zhang
    Lu Qin
    Hong Cheng
    [J]. Distributed and Parallel Databases, 2017, 35 : 149 - 176
  • [6] Hierarchical quantization for billion-scale similarity retrieval
    Chen, Wei
    Ma, Xiao
    Zeng, Jiangfeng
    Duan, Yaoqing
    Zhong, Grace
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2021, 90 (90)
  • [7] PEGASUS: MINING BILLION-SCALE GRAPHS IN THE CLOUD
    Kang, U.
    Chau, Duen Horng Polo
    Faloutsos, Christos
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 5341 - 5344
  • [8] HEigen: Spectral Analysis for Billion-Scale Graphs
    Kang, U.
    Meeder, Brendan
    Papalexakis, Evangelos E.
    Faloutsos, Christos
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) : 350 - 362
  • [9] StructSim: Querying Structural Node Similarity at Billion Scale
    Chen, Xiaoshuang
    Lai, Longbin
    Qin, Lu
    Lin, Xuemin
    [J]. 2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 1950 - 1953
  • [10] Spectral Analysis for Billion-Scale Graphs: Discoveries and Implementation
    Kang, U.
    Meeder, Brendan
    Faloutsos, Christos
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6635 : 13 - 25