p-PIC: Parallel power iteration clustering for big data

被引:26
|
作者
Yan, Weizhong [1 ]
Brahmakshatriya, Umang [1 ]
Xue, Ya [1 ]
Gilder, Mark [2 ]
Wise, Bowden [3 ]
机构
[1] GE Global Res Ctr, Machine Learning Lab, Niskayuna, NY 12039 USA
[2] GE Global Res Ctr, Comp & Cyber Secur Lab, Niskayuna, NY 12039 USA
[3] GE Global Res Ctr, Knowledge Discovery Lab, Niskayuna, NY 12039 USA
关键词
Big data; Clustering; Cloud computing; Data-mining; Distributed computing; Machine learning; Parallel computing; Spectral clustering; ALGORITHM;
D O I
10.1016/j.jpdc.2012.06.009
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Power iteration clustering (PIC) is a newly developed clustering algorithm. It performs clustering by embedding data points in a low-dimensional subspace derived from the similarity matrix. Compared to traditional clustering algorithms, PIC is simple, fast and relatively scalable. However, it requires the data and its associated similarity matrix fit into memory, which makes the algorithm infeasible for big data applications. This paper attempts to expand PIC's data scalability by implementing a parallel power iteration clustering (p-PIC). While this paper focuses on exploring different parallelization strategies and implementation details for minimizing computation and communication costs, we have also paid great attention to ensuring the algorithm works well on low-end commodity computers (COTS-based clusters and general purpose servers found at most commercial cloud providers). The experimental results demonstrate that the proposed p-PIC algorithm is highly scalable to both data and compute resources. (c) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:352 / 359
页数:8
相关论文
共 50 条
  • [1] Parallel Processing of Big Data using Power Iteration Clustering over MapReduce
    Jayalatchumy, D.
    Thambidurai, P.
    Alamelu, A. Vasumathi
    2014 WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT 2014), 2014, : 176 - 178
  • [2] A Parallel Clustering Algorithm for Power Big Data Analysis
    Meng, Xiangjun
    Chen, Liang
    Li, Yidong
    PARALLEL ARCHITECTURE, ALGORITHM AND PROGRAMMING, PAAP 2017, 2017, 729 : 533 - 540
  • [3] A GPU Based Parallel Clustering Method for Electric Power Big Data
    Ji, Cong
    Xiong, Zheng
    Fang, Chao
    Lv, Hui
    Zhang, Kaizhen
    2017 4TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2017, : 29 - 33
  • [4] A survey on parallel clustering algorithms for Big Data
    Dafir, Zineb
    Lamari, Yasmine
    Slaoui, Said Chah
    ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (04) : 2411 - 2443
  • [5] A survey on parallel clustering algorithms for Big Data
    Zineb Dafir
    Yasmine Lamari
    Said Chah Slaoui
    Artificial Intelligence Review, 2021, 54 : 2411 - 2443
  • [6] 3D-PIC: POWER ITERATION CLUSTERING FOR SEGMENTING THREE-DIMENSIONAL MODELS
    Toony, Zahra
    Laurendeau, Denis
    Giguere, Philippe
    Gagne, Christian
    2013 3DTV-CONFERENCE: THE TRUE VISION-CAPTURE, TRANSMISSION AND DISPALY OF 3D VIDEO (3DTV-CON), 2013,
  • [7] Parallel K-prototypes for Clustering Big Data
    Ben HajKacem, Mohamed Aymen
    Ben N'cir, Chiheb-Eddine
    Essoussi, Nadia
    COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2015), PT II, 2015, 9330 : 628 - 637
  • [8] Adaptive Power Iteration Clustering
    Liu, Bo
    Liu, Yong
    Zhang, Huiyan
    Xu, Yonghui
    Tang, Can
    Tang, Lianggui
    Qin, Huafeng
    Miao, Chunyan
    KNOWLEDGE-BASED SYSTEMS, 2021, 225
  • [9] Parallel and distributed clustering framework for big spatial data mining
    Bendechache, Malika
    Tari, A-Kamel
    Kechadi, M-Tahar
    INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2019, 34 (06) : 671 - 689
  • [10] Parallel Clustering of Big Data of Spatio-temporal Trajectory
    Hu, Chunchun
    Kang, Xionghua
    Luo, Nianxue
    Zhao, Qiansheng
    2015 11TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2015, : 769 - 774