Big data clustering with varied density based on MapReduce

被引:31
|
作者
Heidari, Safanaz [1 ]
Alborzi, Mahmood [1 ]
Radfar, Reza [1 ]
Afsharkazemi, Mohammad Ali [2 ]
Ghatari, Ali Rajabzadeh [3 ]
机构
[1] Islamic Azad Univ, Dept Informat Technol Management, Sci & Res Branch, Tehran, Iran
[2] Islamic Azad Univ, Dept Ind Management, Cent Tehran Branch, Tehran, Iran
[3] Tarbiat Modares Univ, Dept Management, Tehran, Iran
关键词
Map-Reduce; Density-based clustering; Big data; ALGORITHM;
D O I
10.1186/s40537-019-0236-x
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The DBSCAN algorithm is a prevalent method of density-based clustering algorithms, the most important feature of which is the ability to detect arbitrary shapes and varied clusters and noise data. Nevertheless, this algorithm faces a number of challenges, including failure to find clusters of varied densities. On the other hand, with the rapid development of the information age, plenty of data are produced every day, such that a single machine alone cannot process this volume of data; hence, new technologies are required to store and extract information from this volume of data. A large volume of data that is beyond the capabilities of existing software is called Big data. In this paper, we have attempted to introduce a new algorithm for clustering big data with varied density using a Hadoop platform running MapReduce. The main idea of this research is the use of local density to find each point's density. This strategy can avoid the situation of connecting clusters with varying densities. The proposed algorithm is implemented and compared with other algorithms using the MapReduce paradigm and shows the best varying density clustering capability and scalability.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Optimized big data K-means clustering using MapReduce
    Cui, Xiaoli
    Zhu, Pingfei
    Yang, Xin
    Li, Keqiu
    Ji, Changqing
    JOURNAL OF SUPERCOMPUTING, 2014, 70 (03): : 1249 - 1259
  • [22] Optimized big data K-means clustering using MapReduce
    Xiaoli Cui
    Pingfei Zhu
    Xin Yang
    Keqiu Li
    Changqing Ji
    The Journal of Supercomputing, 2014, 70 : 1249 - 1259
  • [23] Hierarchical PSO Clustering on MapReduce for Scalable Privacy Preservation in Big Data
    Wai, Ei Nyein Chan
    Tsai, Pei-Wei
    Pan, Jeng-Shyang
    GENETIC AND EVOLUTIONARY COMPUTING, 2017, 536 : 36 - 44
  • [24] Big Data Analytics based on PANFIS MapReduce
    Za'in, Choiru
    Pratama, Mahardhika
    Lughofer, Edwin
    Ferdaus, Meftahul
    Cai, Qing
    Prasad, Mukesh
    INNS CONFERENCE ON BIG DATA AND DEEP LEARNING, 2018, 144 : 140 - 152
  • [25] Distributed Big Data Clustering using MapReduce-based Fuzzy C-Medoids
    Sardar T.H.
    Ansari Z.
    Journal of The Institution of Engineers (India): Series B, 2022, 103 (01) : 73 - 82
  • [26] K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method
    Li, Yongyi
    Yang, Zhongqiang
    Han, Kaixu
    Engineering Intelligent Systems, 2021, 29 (06): : 411 - 418
  • [27] Student Psychology based optimized routing algorithm for big data clustering in IoT with MapReduce framework
    Shanmugam, Gowri
    Thanarajan, Tamilvizhi
    Rajendran, Surendran
    Murugaraj, Sadish Sendil
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 2051 - 2063
  • [28] Efficient MapReduce Kernel k-Means for Big Data Clustering
    Tsapanos, Nikolaos
    Tefas, Anastasios
    Nikolaidis, Nikolaos
    Pitas, Ioannis
    9TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2016), 2016,
  • [29] Utilizing the Buckshot Algorithm for Efficient Big Data Clustering in the MapReduce Model
    Gerakidis, Sergios
    Mamalis, Basilis
    PROCEEDINGS OF THE 23RD PAN-HELLENIC CONFERENCE OF INFORMATICS (PCI 2019), 2019, : 112 - 117
  • [30] Hierarchical Density-Based Clustering Using MapReduce
    dos Santos, Joelson Antonio
    Syed, Talat Iqbal
    Naldi, Murilo C.
    Campello, Ricardo J. G. B.
    Sander, Joerg
    IEEE TRANSACTIONS ON BIG DATA, 2021, 7 (01) : 102 - 114