Hierarchical parallel processing of large scale data clustering on a PC cluster with GPU co-processing

被引:34
|
作者
Takizawa, Hiroyuki [1 ]
Kobayashi, Hiroaki
机构
[1] Tohoku Univ, Grad Sch Informat Sci, Aoba Ku, Sendai, Miyagi 9808578, Japan
[2] Tohoku Univ, Informat Synergy Ctr, Aoba Ku, Sendai, Miyagi 9808578, Japan
来源
JOURNAL OF SUPERCOMPUTING | 2006年 / 36卷 / 03期
关键词
programmable graphics processing unit (GPU); general-purpose computation on GPU (GPGPU); k-means data clustering; PC cluster; the divide-and-conquer approach;
D O I
10.1007/s11227-006-8294-1
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents an effective scheme for clustering a huge data set using a PC cluster system, in which each PC is equipped with a commodity programmable graphics processing unit (GPU). The proposed scheme is devised to achieve three-level hierarchical parallel processing of massive data clustering. The divide-and-conquer approach to parallel data clustering is employed to perform the coarse-grain parallel processing by multiple PCs with a message passing mechanism. By taking advantage of the GPU's parallel processing capability, moreover, the proposed scheme can exploit two types of the fine-grain data parallelism at the different levels in the nearest neighbor search, which is the most computationally-intensive part of the data-clustering process. The performance of our scheme is discussed in comparison with that of the implementation entirely running on CPU. Experimental results clearly show that the proposed hierarchial parallel processing can remarkably accelerate the data clustering task. Especially, GPU co-processing is quite effective to improve the computational efficiency of parallel data clustering on a PC cluster. Although data-transfer from GPU to CPU is generally costly, acceleration by GPU co-processing is significant to save the total execution time of data-clustering.
引用
收藏
页码:219 / 234
页数:16
相关论文
共 50 条
  • [1] Hierarchical parallel processing of large scale data clustering on a PC cluster with GPU co-processing
    Hiroyuki Takizawa
    Hiroaki Kobayashi
    [J]. The Journal of Supercomputing, 2006, 36 : 219 - 234
  • [2] A MULTILEVEL PARALLEL AND SCALABLE SINGLE-HOST GPU CLUSTER FRAMEWORK FOR LARGE-SCALE GEOSPATIAL DATA PROCESSING
    Scott, Grant J.
    Backus, Kirk
    Anderson, Derek T.
    [J]. 2014 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2014, : 2475 - 2478
  • [3] Parallel data mining on large scale PC cluster
    Kitsuregawa, M
    Shintani, T
    Tamura, M
    Pramudiono, I
    [J]. WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2000, 1846 : 15 - 26
  • [4] GPU enhanced parallel computing for large scale data clustering
    Cui, Xiaohui
    St Charles, Jesse
    Potok, Thomas
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2013, 29 (07): : 1736 - 1741
  • [5] Parallel database processing/data mining on large-scale ATM connected PC cluster: Preliminary performance evaluation
    Kitsuregawa, M.
    Tamura, T.
    Oguchi, M.
    [J]. 1998, IASTED, Calgary, Canada (01):
  • [6] Parallel database processing/data mining on large-scale ATM connected PC cluster: Preliminary performance evaluation
    Kitsuregawa, M.
    Tamura, T.
    Oguchi, M.
    [J]. International Journal of Parallel and Distributed Systems & Networks, 1 (02): : 108 - 114
  • [7] GCMR: A GPU Cluster-based MapReduce Framework for Large-scale Data Processing
    Guo, Yiru
    Liu, Weiguo
    Gong, Bin
    Voss, Gerrit
    Mueller-Wittig, Wolfgang
    [J]. 2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 580 - 586
  • [8] An Efficient Co-processing Framework for Large-scale Scientific Applications
    Duan, Rubing
    Goh, Rick Siow Mong
    Rachmawati, Lily
    Wang, Long
    Palit, Henry N.
    Li, Xiaorong
    Goh, Chi Keong
    Dutta, Partha
    Lapworth, Leigh
    Knott, David
    [J]. 2014 IEEE 6TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), 2014, : 254 - 261
  • [9] Co-processing SPMD computation on CPUs and GPUs cluster
    Li, Hui
    Fox, Geoffrey
    von Laszewski, Gregor
    Chauhan, Arun
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2013,
  • [10] Waste Not ... Efficient Co-Processing of Relational Data
    Pirk, Holger
    Manegold, Stefan
    Kersten, Martin
    [J]. 2014 IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2014, : 508 - 519