Efficient structural node similarity computation on billion-scale graphs

被引：0

作者：

Xiaoshuang Chen

Longbin Lai

Lu Qin

Xuemin Lin

机构：

[1] University of New South Wales,Centre for AI

[2] Alibaba Group,undefined

[3] University of Technology Sydney,undefined

[4] East China Normal University,undefined

来源：

The VLDB Journal | 2021年 / 30卷

关键词：

Node similarity; Role similarity; Efficiency; Link analysis;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Structural node similarity is widely used in analyzing complex networks. As one of the structural node similarity metrics, role similarity has the good merit of indicating automorphism (isomorphism). Existing algorithms to compute role similarity (e.g., RoleSim and NED) suffer from severe performance bottlenecks and thus cannot handle large real-world graphs. In this paper, we propose a new framework, namely StructSim, to compute nodes’ role similarity. Under this framework, we first prove that StructSim is an admissible role similarity metric based on the maximum matching. While the maximum matching is still too costly to scale, we then devise the BinCount matching that not only is efficient to compute but also guarantees the admissibility of StructSim. BinCount-based StructSim admits a precomputed index to query a single pair of node in O(klogD)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(k\log D)$$\end{document} time, where k is a small user-defined parameter and D is the maximum node degree. To build the index, we further devise an FM-sketch-based technique that can handle graphs with billions of edges. Extensive empirical studies show that StructSim performs much better than the existing works regarding both effectiveness and efficiency when applied to compute structural node similarities on the real-world graphs.

引用

页码：471 / 493

页数：22

共 50 条

[1] Efficient structural node similarity computation on billion-scale graphs
Chen, Xiaoshuang
Lai, Longbin
Qin, Lu
Lin, Xuemin
[J]. VLDB JOURNAL, 2021, 30 (03): : 471 - 493
[2] Efficient Triangle Listing for Billion-Scale Graphs
Zhang, Hao
Zhu, Yuanyuan
Qin, Lu
Cheng, Hong
Yu, Jeffrey Xu
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 813 - 822
[3] Billion-Scale Similarity Search with GPUs
Johnson, Jeff
Douze, Matthijs
Jegou, Herve
[J]. IEEE TRANSACTIONS ON BIG DATA, 2021, 7 (03) : 535 - 547
[4] Efficient MapReduce algorithms for triangle listing in billion-scale graphs
Zhu, Yuanyuan
Zhang, Hao
Qin, Lu
Cheng, Hong
[J]. DISTRIBUTED AND PARALLEL DATABASES, 2017, 35 (02) : 149 - 176
[5] Efficient MapReduce algorithms for triangle listing in billion-scale graphs
Yuanyuan Zhu
Hao Zhang
Lu Qin
Hong Cheng
[J]. Distributed and Parallel Databases, 2017, 35 : 149 - 176
[6] Hierarchical quantization for billion-scale similarity retrieval
Chen, Wei
Ma, Xiao
Zeng, Jiangfeng
Duan, Yaoqing
Zhong, Grace
[J]. COMPUTERS & ELECTRICAL ENGINEERING, 2021, 90 (90)
[7] PEGASUS: MINING BILLION-SCALE GRAPHS IN THE CLOUD
Kang, U.
Chau, Duen Horng Polo
Faloutsos, Christos
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 5341 - 5344
[8] HEigen: Spectral Analysis for Billion-Scale Graphs
Kang, U.
Meeder, Brendan
Papalexakis, Evangelos E.
Faloutsos, Christos
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) : 350 - 362
[9] StructSim: Querying Structural Node Similarity at Billion Scale
Chen, Xiaoshuang
Lai, Longbin
Qin, Lu
Lin, Xuemin
[J]. 2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 1950 - 1953
[10] Spectral Analysis for Billion-Scale Graphs: Discoveries and Implementation
Kang, U.
Meeder, Brendan
Faloutsos, Christos
[J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6635 : 13 - 25

← 1 2 3 4 5 →