Dynamic group communication for large-scale parallel data mining

被引:0
|
作者
Katti, Amogh [1 ]
Di Fatta, Giuseppe [1 ]
机构
[1] Univ Reading, Sch Syst Engn, Reading RG6 6AY, Berks, England
来源
关键词
Extreme-scale computing; dynamic group communication; parallel data mining; clustering; k-means; CLUSTERING-ALGORITHM;
D O I
10.1177/1063293X13495551
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Exascale systems are the next frontier in high-performance computing and are expected to deliver a performance of the order of 10(18) operations per second using massive multicore processors. Very large- and extreme-scale parallel systems pose critical algorithmic challenges, especially related to concurrency, locality and the need to avoid global communication patterns. This work investigates a novel protocol for dynamic group communication that can be used to remove the global communication requirement and to reduce the communication cost in parallel formulations of iterative data mining algorithms. The protocol is used to provide a communication-efficient parallel formulation of the k-means algorithm for cluster analysis. The approach is based on a collective communication operation for dynamic groups of processes and exploits non-uniform data distributions. Non-uniform data distributions can be either found in real-world distributed applications or induced by means of multidimensional binary search trees. The analysis of the proposed dynamic group communication protocol has shown that it does not introduce significant communication overhead. The parallel clustering algorithm has also been extended to accommodate an approximation error, which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.
引用
收藏
页码:227 / 234
页数:8
相关论文
共 50 条
  • [1] Efficient Group Communication for Large-Scale Parallel Clustering
    Pettinger, David
    Di Fatta, Giuseppe
    [J]. INTELLIGENT DISTRIBUTED COMPUTING VI, 2013, 446 : 155 - 164
  • [2] Parallel Bifold:: Large-scale parallel pattern mining with constraints
    El-Hajj, Mohammad
    Zaiane, Osmar R.
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 2006, 20 (03) : 225 - 243
  • [3] Parallel Bifold: Large-scale parallel pattern mining with constraints
    Mohammad El-Hajj
    Osmar R. Zaïane
    [J]. Distributed and Parallel Databases, 2006, 20 : 225 - 243
  • [4] Large-scale parallel data clustering
    Judd, D
    McKinley, PK
    Jain, AK
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (08) : 871 - 876
  • [5] Hierarchical visual data mining for large-scale data
    Matthew Ward
    Wei Peng
    Xiaoning Wang
    [J]. Computational Statistics, 2004, 19 : 147 - 158
  • [6] Hierarchical visual data mining for large-scale data
    Ward, M
    Peng, W
    Wang, XN
    [J]. COMPUTATIONAL STATISTICS, 2004, 19 (01) : 147 - 158
  • [7] Intelligent approach for large-scale data mining
    Fouad, Khaled M.
    El-Bably, Doaa L.
    [J]. INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2020, 63 (1-2) : 93 - 113
  • [8] Sparse computation for large-scale data mining
    Hochbaum, Dorit S.
    Baumann, Philipp
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014, : 354 - 363
  • [9] Entity Relation Mining in Large-Scale Data
    Li, Jingnan
    Cai, Yi
    Wang, Qixuan
    Hu, Shuyue
    Wang, Tao
    Min, Huaqing
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2015, 2015, 9052 : 109 - 121
  • [10] Parallel data mining on large scale PC cluster
    Kitsuregawa, M
    Shintani, T
    Tamura, M
    Pramudiono, I
    [J]. WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2000, 1846 : 15 - 26