Cluster-preserving sampling algorithm for large-scale graphs

被引:4
|
作者
Zhang, Jianpeng [1 ]
Chen, Hongchang [1 ]
Yu, Dingjiu [1 ,2 ]
Pei, Yulong [3 ]
Deng, Yingjun [4 ]
机构
[1] Informat Engn Univ, Natl Digital Switching Syst E&T Res Ctr, Zhengzhou 450001, Peoples R China
[2] Network Syst Dept Strateg Support Force, Beijing 100091, Peoples R China
[3] Eindhoven Univ Technol, Sch Comp Sci & Technol, NL-5612 AE Eindhoven, Netherlands
[4] Tianjin Univ, Ctr Appl Math, Tianjin 300072, Peoples R China
基金
中国博士后科学基金;
关键词
graph sampling; clustering structure; top-leader nodes; expansion strategies; large-scale graphs; NETWORKS;
D O I
10.1007/s11432-021-3370-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Graph sampling is a very effective method to deal with scalability issues when analyzing large-scale graphs. Lots of sampling algorithms have been proposed, and sampling qualities have been quantified using explicit properties (e.g., degree distribution) of the sample. However, the existing sampling techniques are inadequate for the current sampling task: sampling the clustering structure, which is a crucial property of the current networks. In this paper, using different expansion strategies, two novel top-leader sampling methods (i.e., TLS-e and TLS-i) are proposed to obtain representative samples, and they are capable of effectively preserving the clustering structure. The rationale behind them is to select top-leader nodes of most clusters into the sample and then heuristically incorporate peripheral nodes into the sample using specific expansion strategies. Extensive experiments are conducted to investigate how well sampling techniques preserve the clustering structure of graphs. Our empirical results show that the proposed sampling algorithms can preserve the population's clustering structure well and provide feasible solutions to sample the clustering structure from large-scale graphs.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] A Sampling-Based Graph Clustering Algorithm for Large-Scale Networks
    Zhang J.-P.
    Chen H.-C.
    Wang K.
    Zhu K.-J.
    Wang Y.-W.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2019, 47 (08): : 1731 - 1737
  • [22] Large-Scale Pairwise Sequence Alignments on a Large-Scale GPU Cluster
    Savran, Ibrahim
    Gao, Yang
    Bakos, Jason D.
    IEEE DESIGN & TEST, 2014, 31 (01) : 51 - 61
  • [23] Adaptive Weighted Clustering Algorithm for Large-Scale Satellite Cluster Network
    Chen Y.
    Zhang Y.
    Chen S.
    Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology, 2021, 41 (11): : 1188 - 1192
  • [24] An efficient dynamic load-balancing algorithm in a large-scale cluster
    Zhang, BY
    Mo, ZY
    Yang, GW
    Zheng, WM
    DISTRIBUTED AND PARALLEL COMPUTING, 2005, 3719 : 174 - 183
  • [25] An Improved Density-Based Cluster Analysis Method Combining Genetic Algorithm and Data Sampling for Large-Scale Datasets
    Ye Zonglin
    Cao Hui
    Wang Miaomiao
    Zhang Yanbin
    2013 32ND CHINESE CONTROL CONFERENCE (CCC), 2013, : 3552 - 3555
  • [26] Finding Structures in Large-scale Graphs
    Chin, Sang Peter
    Reilly, Elizabeth
    Lu, Linyuan
    CYBER SENSING 2012, 2012, 8408
  • [27] Large-scale structures in random graphs
    Bottcher, Julia
    SURVEYS IN COMBINATORICS 2017, 2017, 440 : 87 - 140
  • [28] Attributed graph clustering under the contrastive mechanism with cluster-preserving augmentation
    Zheng, Yimei
    Jia, Caiyan
    Yu, Jian
    INFORMATION SCIENCES, 2024, 681
  • [29] AEROSOL FRACTIONATOR FOR LARGE-SCALE SAMPLING
    FORNEY, LJ
    REVIEW OF SCIENTIFIC INSTRUMENTS, 1976, 47 (10): : 1264 - 1269
  • [30] LARGE-SCALE PHOTO SAMPLING WORKSHOP
    KIRBY, CL
    FORESTRY CHRONICLE, 1970, 46 (05): : 360 - &