Big data clustering with varied density based on MapReduce

被引：31

作者：

Heidari, Safanaz ^{[1
]}

Alborzi, Mahmood ^{[1
]}

Radfar, Reza ^{[1
]}

Afsharkazemi, Mohammad Ali ^{[2
]}

Ghatari, Ali Rajabzadeh ^{[3
]}

机构：

[1] Islamic Azad Univ, Dept Informat Technol Management, Sci & Res Branch, Tehran, Iran

[2] Islamic Azad Univ, Dept Ind Management, Cent Tehran Branch, Tehran, Iran

[3] Tarbiat Modares Univ, Dept Management, Tehran, Iran

来源：

JOURNAL OF BIG DATA | 2019年 / 6卷 / 01期

关键词：

Map-Reduce; Density-based clustering; Big data; ALGORITHM;

D O I：

10.1186/s40537-019-0236-x

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The DBSCAN algorithm is a prevalent method of density-based clustering algorithms, the most important feature of which is the ability to detect arbitrary shapes and varied clusters and noise data. Nevertheless, this algorithm faces a number of challenges, including failure to find clusters of varied densities. On the other hand, with the rapid development of the information age, plenty of data are produced every day, such that a single machine alone cannot process this volume of data; hence, new technologies are required to store and extract information from this volume of data. A large volume of data that is beyond the capabilities of existing software is called Big data. In this paper, we have attempted to introduce a new algorithm for clustering big data with varied density using a Hadoop platform running MapReduce. The main idea of this research is the use of local density to find each point's density. This strategy can avoid the situation of connecting clusters with varying densities. The proposed algorithm is implemented and compared with other algorithms using the MapReduce paradigm and shows the best varying density clustering capability and scalability.

引用

页数：16

共 50 条

[21] Optimized big data K-means clustering using MapReduce
Cui, Xiaoli
Zhu, Pingfei
Yang, Xin
Li, Keqiu
Ji, Changqing
JOURNAL OF SUPERCOMPUTING, 2014, 70 (03): : 1249 - 1259
[22] Optimized big data K-means clustering using MapReduce
Xiaoli Cui
Pingfei Zhu
Xin Yang
Keqiu Li
Changqing Ji
The Journal of Supercomputing, 2014, 70 : 1249 - 1259
[23] Hierarchical PSO Clustering on MapReduce for Scalable Privacy Preservation in Big Data
Wai, Ei Nyein Chan
Tsai, Pei-Wei
Pan, Jeng-Shyang
GENETIC AND EVOLUTIONARY COMPUTING, 2017, 536 : 36 - 44
[24] Big Data Analytics based on PANFIS MapReduce
Za'in, Choiru
Pratama, Mahardhika
Lughofer, Edwin
Ferdaus, Meftahul
Cai, Qing
Prasad, Mukesh
INNS CONFERENCE ON BIG DATA AND DEEP LEARNING, 2018, 144 : 140 - 152
[25] Distributed Big Data Clustering using MapReduce-based Fuzzy C-Medoids
Sardar T.H.
Ansari Z.
Journal of The Institution of Engineers (India): Series B, 2022, 103 (01) : 73 - 82
[26] K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method
Li, Yongyi
Yang, Zhongqiang
Han, Kaixu
Engineering Intelligent Systems, 2021, 29 (06): : 411 - 418
[27] Student Psychology based optimized routing algorithm for big data clustering in IoT with MapReduce framework
Shanmugam, Gowri
Thanarajan, Tamilvizhi
Rajendran, Surendran
Murugaraj, Sadish Sendil
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 2051 - 2063
[28] Efficient MapReduce Kernel k-Means for Big Data Clustering
Tsapanos, Nikolaos
Tefas, Anastasios
Nikolaidis, Nikolaos
Pitas, Ioannis
9TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2016), 2016,
[29] Utilizing the Buckshot Algorithm for Efficient Big Data Clustering in the MapReduce Model
Gerakidis, Sergios
Mamalis, Basilis
PROCEEDINGS OF THE 23RD PAN-HELLENIC CONFERENCE OF INFORMATICS (PCI 2019), 2019, : 112 - 117
[30] Hierarchical Density-Based Clustering Using MapReduce
dos Santos, Joelson Antonio
Syed, Talat Iqbal
Naldi, Murilo C.
Campello, Ricardo J. G. B.
Sander, Joerg
IEEE TRANSACTIONS ON BIG DATA, 2021, 7 (01) : 102 - 114

← 1 2 3 4 5 →