Cluster-preserving sampling algorithm for large-scale graphs

被引:4
|
作者
Zhang, Jianpeng [1 ]
Chen, Hongchang [1 ]
Yu, Dingjiu [1 ,2 ]
Pei, Yulong [3 ]
Deng, Yingjun [4 ]
机构
[1] Informat Engn Univ, Natl Digital Switching Syst E&T Res Ctr, Zhengzhou 450001, Peoples R China
[2] Network Syst Dept Strateg Support Force, Beijing 100091, Peoples R China
[3] Eindhoven Univ Technol, Sch Comp Sci & Technol, NL-5612 AE Eindhoven, Netherlands
[4] Tianjin Univ, Ctr Appl Math, Tianjin 300072, Peoples R China
基金
中国博士后科学基金;
关键词
graph sampling; clustering structure; top-leader nodes; expansion strategies; large-scale graphs; NETWORKS;
D O I
10.1007/s11432-021-3370-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Graph sampling is a very effective method to deal with scalability issues when analyzing large-scale graphs. Lots of sampling algorithms have been proposed, and sampling qualities have been quantified using explicit properties (e.g., degree distribution) of the sample. However, the existing sampling techniques are inadequate for the current sampling task: sampling the clustering structure, which is a crucial property of the current networks. In this paper, using different expansion strategies, two novel top-leader sampling methods (i.e., TLS-e and TLS-i) are proposed to obtain representative samples, and they are capable of effectively preserving the clustering structure. The rationale behind them is to select top-leader nodes of most clusters into the sample and then heuristically incorporate peripheral nodes into the sample using specific expansion strategies. Extensive experiments are conducted to investigate how well sampling techniques preserve the clustering structure of graphs. Our empirical results show that the proposed sampling algorithms can preserve the population's clustering structure well and provide feasible solutions to sample the clustering structure from large-scale graphs.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Cluster-preserving sampling algorithm for large-scale graphs
    Jianpeng ZHANG
    Hongchang CHEN
    Dingjiu YU
    Yulong PEI
    Yingjun DENG
    Science China(Information Sciences), 2023, 66 (01) : 60 - 76
  • [2] Cluster-preserving sampling algorithm for large-scale graphs
    Jianpeng Zhang
    Hongchang Chen
    Dingjiu Yu
    Yulong Pei
    Yingjun Deng
    Science China Information Sciences, 2023, 66
  • [3] Cluster-preserving sampling from fully-dynamic streaming graphs
    Zhang, Jianpeng
    Zhu, Kaijie
    Pei, Yulong
    Fletcher, George
    Pechenizkiy, Mykola
    INFORMATION SCIENCES, 2019, 482 : 279 - 300
  • [4] Heavy Hitters via Cluster-Preserving Clustering
    Larsen, Kasper Green
    Nelson, Jelani
    Huy L Nguyen
    Thorup, Mikkel
    COMMUNICATIONS OF THE ACM, 2019, 62 (08) : 95 - 100
  • [5] Heavy hitters via cluster-preserving clustering
    Larsen, Kasper Green
    Nelson, Jelani
    Nguyen, Huy L.
    Thorup, Mikkel
    2016 IEEE 57TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2016, : 61 - 70
  • [6] Cluster-preserving dimension reduction methods for document classification
    Howland, Peg
    Park, Haesun
    SURVEY OF TEXT MINING II: CLUSTERING, CLASSIFICATION, AND RETRIEVAL, 2008, : 3 - +
  • [7] InfoNCE Loss Provably Learns Cluster-Preserving Representations
    Parulekar, Advait
    Collins, Liam
    Shanmugam, Karthikeyan
    Mokhtari, Aryan
    Shakkottai, Sanjay
    THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195
  • [8] A stratified sampling based clustering algorithm for large-scale data
    Zhao, Xingwang
    Liang, Jiye
    Dang, Chuangyin
    KNOWLEDGE-BASED SYSTEMS, 2019, 163 : 416 - 428
  • [9] Geometry Preserving Sampling Method Based on Spectral Decomposition for Large-Scale Environments
    Labussiere, Mathieu
    Laconte, Johann
    Pomerleau, Francois
    FRONTIERS IN ROBOTICS AND AI, 2020, 7
  • [10] EFFICIENT HEURISTIC CLUSTER ALGORITHM FOR TEARING LARGE-SCALE NETWORKS
    SANGIOVANNIVINCENTELLI, A
    CHEN, LK
    CHUA, LO
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, 1977, 24 (12): : 709 - 717