Iterative big data clustering algorithms: a review

被引:43
|
作者
Mohebi, Amin [1 ]
Aghabozorgi, Saeed [1 ]
Teh Ying Wah [1 ]
Herawan, Tutut [1 ]
Yahyapour, Ramin [2 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, Kuala Lumpur, Malaysia
[2] Gesell Wissensch Datenverarbeitung mbH Gottingen, Gottingen, Germany
来源
SOFTWARE-PRACTICE & EXPERIENCE | 2016年 / 46卷 / 01期
关键词
big data; large-scale; MapReduce; clustering; Hadoop; PARALLEL; MAPREDUCE; FRAMEWORK; MR;
D O I
10.1002/spe.2341
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Enterprises today are dealing with the massive size of data, which have been explosively increasing. The key requirements to address this challenge are to extract, analyze, and process data in a timely manner. Clustering is an essential data mining tool that plays an important role for analyzing big data. However, largescale data clustering has become a challenging task because of the large amount of information that emerges from technological progress in many areas, including finance and business informatics. Accordingly, researchers have dealt with parallel clustering algorithms using parallel programming models to address this issue. MapReduce is one of the most famous frameworks, and it has attracted great attention because of its flexibility, ease of programming, and fault tolerance. However, the framework has evident performance limitations, especially for iterative programs. This study will first review the proposed iterative frameworks that extended MapReduce to support iterative algorithms. We summarize these techniques, discuss their uniqueness and limitations, and explain how they address the challenging issues of iterative programs. We also perform an in-depth review to understand the problems and the solving techniques for parallel clustering algorithms. Hence, we believe that no well-rounded review provides a significant comparison among parallel clustering algorithms using MapReduce. This work aims to serve as a stepping stone for researchers who are studying big data clustering algorithms. Copyright (c) 2015 John Wiley & Sons, Ltd.
引用
收藏
页码:107 / 129
页数:23
相关论文
共 50 条
  • [1] A Review of Clustering Algorithms for Big Data
    Djouzi, Kheyreddine
    Beghdad-Bey, Kadda
    [J]. 2019 4TH INTERNATIONAL CONFERENCE ON NETWORKING AND ADVANCED SYSTEMS (ICNAS 2019), 2019, : 117 - 122
  • [2] Scalable Clustering Algorithms for Big Data: A Review
    Mahdi, Mahmoud A.
    Hosny, Khalid M.
    Elhenawy, Ibrahim
    [J]. IEEE ACCESS, 2021, 9 : 80015 - 80027
  • [3] Different Clustering Algorithms for Big Data Analytics: A Review
    Dave, Meenu
    Gianey, Hemant
    [J]. PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON SYSTEM MODELING & ADVANCEMENT IN RESEARCH TRENDS (SMART-2016), 2016, : 328 - 333
  • [4] Big Data and Clustering Algorithms
    Ajin, V. W.
    Kumar, Lekshmy D.
    [J]. 2016 INTERNATIONAL CONFERENCE ON RESEARCH ADVANCES IN INTEGRATED NAVIGATION SYSTEMS (RAINS), 2016,
  • [5] Iterative Unified Clustering in Big Data
    Misal, Vasundhara
    Janeja, Vandana P.
    Pallaprolu, Sai C.
    Yesha, Yelena
    Chintalapati, Raghu
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 3412 - 3421
  • [6] Clustering Algorithms for Spatial Big Data
    Schoier, Gabriella
    Gregorio, Caterina
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2017, PT IV, 2017, 10407 : 571 - 583
  • [7] A Review on Density-Based Clustering Algorithms for Big Data Analysis
    Reddy, K. Shyam Sunder
    Bindu, C. Shoba
    [J]. 2017 INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC), 2017, : 123 - 130
  • [8] Big Data Clustering: A Review
    Shirkhorshidi, Ali Seyed
    Aghabozorgi, Saeed
    Teh, Ying Wah
    Herawan, Tutut
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2014, PT V, 2014, 8583 : 707 - 720
  • [9] A survey on parallel clustering algorithms for Big Data
    Zineb Dafir
    Yasmine Lamari
    Said Chah Slaoui
    [J]. Artificial Intelligence Review, 2021, 54 : 2411 - 2443
  • [10] Analysis of Mahout Big Data Clustering Algorithms
    Sharma, Ishan
    Tiwari, Rajeev
    Rana, Hukam Singh
    Anand, Abhineet
    [J]. INTELLIGENT COMMUNICATION, CONTROL AND DEVICES, ICICCD 2017, 2018, 624 : 999 - 1008