共 50 条
A robust, efficient, and balanced parallel algorithm for finding connected components
被引:0
|作者:
Asokan, M.
[1
]
机构:
[1] Syncsort Inc, Pearl River, NY 10965 USA
关键词:
Graph Mining;
Connected Components;
Hadoop;
MapReduce;
MAPREDUCE;
D O I:
暂无
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
Finding connected components in an undirected graph has many practical applications. For example in a graph representing a social network, a connected component represents a group of related individuals with common interest. Also, finding connected components forms the basis for other clustering algorithms. In this paper, we will present a parallel algorithm which uses the well known sequential algorithm as the basis for finding connected components in an undirected graph. The algorithm can be adopted to run on a single computer with multiple cores or MapReduce. It is robust in the sense that it honors memory limits. This is important in today's containerized environments. It balances the workload even in the presence of data skew. For the best known algorithm running in MapReduce, the number of iterations is the square of the logarithmic function of the number of vertices in the graph. For our algorithm, we will prove that the upper bound on the number of iterations is a logarithmic function of the maximum size of a connected component. In each iteration, the amount of data read from or written to a file system is bounded by four times the number of edges in the graph.
引用
下载
收藏
页码:2113 / 2118
页数:6
相关论文