Iterative big data clustering algorithms: a review

被引:43
|
作者
Mohebi, Amin [1 ]
Aghabozorgi, Saeed [1 ]
Teh Ying Wah [1 ]
Herawan, Tutut [1 ]
Yahyapour, Ramin [2 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, Kuala Lumpur, Malaysia
[2] Gesell Wissensch Datenverarbeitung mbH Gottingen, Gottingen, Germany
来源
SOFTWARE-PRACTICE & EXPERIENCE | 2016年 / 46卷 / 01期
关键词
big data; large-scale; MapReduce; clustering; Hadoop; PARALLEL; MAPREDUCE; FRAMEWORK; MR;
D O I
10.1002/spe.2341
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Enterprises today are dealing with the massive size of data, which have been explosively increasing. The key requirements to address this challenge are to extract, analyze, and process data in a timely manner. Clustering is an essential data mining tool that plays an important role for analyzing big data. However, largescale data clustering has become a challenging task because of the large amount of information that emerges from technological progress in many areas, including finance and business informatics. Accordingly, researchers have dealt with parallel clustering algorithms using parallel programming models to address this issue. MapReduce is one of the most famous frameworks, and it has attracted great attention because of its flexibility, ease of programming, and fault tolerance. However, the framework has evident performance limitations, especially for iterative programs. This study will first review the proposed iterative frameworks that extended MapReduce to support iterative algorithms. We summarize these techniques, discuss their uniqueness and limitations, and explain how they address the challenging issues of iterative programs. We also perform an in-depth review to understand the problems and the solving techniques for parallel clustering algorithms. Hence, we believe that no well-rounded review provides a significant comparison among parallel clustering algorithms using MapReduce. This work aims to serve as a stepping stone for researchers who are studying big data clustering algorithms. Copyright (c) 2015 John Wiley & Sons, Ltd.
引用
收藏
页码:107 / 129
页数:23
相关论文
共 50 条
  • [31] Big data clustering techniques based on Spark: a literature review
    Saeed M.M.
    Aghbari Z.A.
    Alsharidah M.
    [J]. Saeed, Mozamel M. (mozamel8888@gmail.com), 2020, PeerJ Inc. (06) : 1 - 28
  • [32] Big data clustering techniques based on Spark: a literature review
    Saeed, Mozamel M.
    Al Aghbari, Zaher
    Alsharidah, Mohammed
    [J]. PEERJ COMPUTER SCIENCE, 2020,
  • [33] Fuzzy Based Clustering Algorithms to Handle Big Data with Implementation on Apache Spark
    Bharill, Neha
    Tiwari, Aruna
    Malviya, Aayushi
    [J]. PROCEEDINGS 2016 IEEE SECOND INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2016), 2016, : 95 - 104
  • [34] A Quantitative Analysis of Big Data Clustering Algorithms for Market Segmentation in Hospitality Industry
    Bose, Avishek
    Munir, Arslan
    Shabani, Neda
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2020, : 554 - 559
  • [35] Research on complex attribute big data classification based on iterative fuzzy clustering algorithm
    Qian, Li
    [J]. WEB INTELLIGENCE, 2021, 19 (1-2) : 147 - 158
  • [36] Algorithms for Big Data
    Meyer, Ulrich
    Abedjan, Ziawasch
    [J]. IT-INFORMATION TECHNOLOGY, 2020, 62 (3-4): : 117 - 118
  • [37] Creating streaming iterative soft clustering algorithms
    Hore, Prodip
    Hall, Lawrence O.
    Goldgof, Dmitry B.
    [J]. NAFIPS 2007 - 2007 ANNUAL MEETING OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY, 2007, : 484 - +
  • [38] Scalable big earth observation data mining algorithms: a review
    Sisodiya, Neha
    Dube, Nitant
    Prakash, Om
    Thakkar, Priyank
    [J]. EARTH SCIENCE INFORMATICS, 2023, 16 (3) : 1993 - 2016
  • [39] Scalable big earth observation data mining algorithms: a review
    Neha Sisodiya
    Nitant Dube
    Om Prakash
    Priyank Thakkar
    [J]. Earth Science Informatics, 2023, 16 : 1993 - 2016
  • [40] A Review at Machine Learning Algorithms Targeting Big Data Challenges
    Rathor, Abhinav
    Gyanchandani, Manasi
    [J]. 2017 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER, AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2017, : 753 - 758