MapReduce-Based Graph Structural Clustering Algorithm

被引:0
|
作者
Zhang W.-P. [1 ]
Li Z.-J. [1 ]
Li R.-H. [1 ]
Liu Y.-H. [1 ]
Mao R. [1 ]
Qiao S.-J. [2 ]
机构
[1] College of Computer Science & Software Engineering, Shenzhen University, Shenzhen
[2] School of Cybersecurity, Chengdu University of Information Technology, Chengdu
来源
Ruan Jian Xue Bao/Journal of Software | 2018年 / 29卷 / 03期
基金
中国国家自然科学基金;
关键词
Graph data; MapReduce; Parallel computing model; Structural graph clustering;
D O I
10.13328/j.cnki.jos.005456
中图分类号
学科分类号
摘要
Graph Clustering is a fundamental task for graph mining which has been widely used in social network analysis related applications. Graph structural clustering (SCAN) is a well-known density-based graph clustering algorithm. SCAN algorithm can not only find the clusters in a graph, but also be able to identify hub nodes and outliers. However, with the growing graph size, the traditional SCAN algorithm is very hard to handle massive graph data, as its time complexity is O(m1.5) (m is the number of edges in the graph). To overcome the scalability issue of SCAN algorithm, this paper proposes a MapReduce based graph structural clustering algorithm, called MRSCAN. Specifically, the paper develops a MapReduce based similarity computation, a core node computation, as well as two clustering merging algorithms. In addition, it conducts extensive experiments over serval real-world graph datasets, and results demonstrate the accuracy, effectiveness, and scalability of the presented algorithm. © Copyright 2018, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:627 / 641
页数:14
相关论文
共 22 条
  • [1] Mayer-Schosnberger V., Cukier K., Wrote Zhou T., Et al., Trans. Big Data: A Revolution That Will Transform How We Live, Work, and Think, (2013)
  • [2] Shvachko K., Kuang H., Radia S., Chansler R., The hadoop distributed file system, Proc. of the IEEE Symp. on MASS Storage Systems and Technologies, pp. 1-10, (2010)
  • [3] Shiokawa H., Fujiwara Y., Onizuka M., SCAN++: Efficient algorithm for finding clusters, hubs and outliers on large-scale graphs, Proc. of the VLDB Endowment, (2015)
  • [4] Chang L., Li W., Qin L., Zhang W., pSCAN: Fast and exact structural graph clustering, IEEE Trans. on Knowledge & Data Engineering, 29, 2, pp. 387-401, (2017)
  • [5] Li J.J., Cui J., Wang D., Yan L., Huang Y.S., Survey of MapReduce parallel programming model, Acta Electronic Journal, 39, 11, pp. 2635-2642, (2011)
  • [6] Wang F., Lei B.H., Model analysis of hadoop distributed file system, Telecommunications Science, 26, 12, pp. 95-99, (2010)
  • [7] Chen F., Kodialam M., Lakshman T.V., Joint scheduling of processing and Shuffle phases in MapReduce systems, Proc. of the IEEE INFOCOM, pp. 1143-1151, (2012)
  • [8] Xu X., Yuruk N., Feng Z., Schweiger T.A.J., SCAN: A structural clustering algorithm for networks, Proc. of the ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, pp. 824-833, (2007)
  • [9] Zhou F.F., Li J.C., Huang W., Wang J.W., Zhao Y., Based on dimension expansion Radviz visual clustering analysis method, Ruan Jian Xue Bao/Journal of Software, 27, 5, pp. 1127-1139, (2016)
  • [10] Guo Q.K., Study on connection method based on MapReduce, (2014)