Big data clustering with varied density based on MapReduce

被引：31

作者：

Heidari, Safanaz ^{[1
]}

Alborzi, Mahmood ^{[1
]}

Radfar, Reza ^{[1
]}

Afsharkazemi, Mohammad Ali ^{[2
]}

Ghatari, Ali Rajabzadeh ^{[3
]}

机构：

[1] Islamic Azad Univ, Dept Informat Technol Management, Sci & Res Branch, Tehran, Iran

[2] Islamic Azad Univ, Dept Ind Management, Cent Tehran Branch, Tehran, Iran

[3] Tarbiat Modares Univ, Dept Management, Tehran, Iran

来源：

JOURNAL OF BIG DATA | 2019年 / 6卷 / 01期

关键词：

Map-Reduce; Density-based clustering; Big data; ALGORITHM;

D O I：

10.1186/s40537-019-0236-x

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The DBSCAN algorithm is a prevalent method of density-based clustering algorithms, the most important feature of which is the ability to detect arbitrary shapes and varied clusters and noise data. Nevertheless, this algorithm faces a number of challenges, including failure to find clusters of varied densities. On the other hand, with the rapid development of the information age, plenty of data are produced every day, such that a single machine alone cannot process this volume of data; hence, new technologies are required to store and extract information from this volume of data. A large volume of data that is beyond the capabilities of existing software is called Big data. In this paper, we have attempted to introduce a new algorithm for clustering big data with varied density using a Hadoop platform running MapReduce. The main idea of this research is the use of local density to find each point's density. This strategy can avoid the situation of connecting clusters with varying densities. The proposed algorithm is implemented and compared with other algorithms using the MapReduce paradigm and shows the best varying density clustering capability and scalability.

引用

页数：16

共 50 条

[31] A Review on Density-Based Clustering Algorithms for Big Data Analysis
Reddy, K. Shyam Sunder
Bindu, C. Shoba
2017 INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC), 2017, : 123 - 130
[32] Atrak: a MapReduce-based data warehouse for big data
Barkhordari, Mohammadhossein
Niamanesh, Mahdi
JOURNAL OF SUPERCOMPUTING, 2017, 73 (10): : 4596 - 4610
[33] Atrak: a MapReduce-based data warehouse for big data
Mohammadhossein Barkhordari
Mahdi Niamanesh
The Journal of Supercomputing, 2017, 73 : 4596 - 4610
[34] Parallel Fuzzy C-Means Clustering Based Big Data Anonymization Using Hadoop MapReduce
Lawrance, Josephine Usha
Jesudhasan, Jesu Vedha Nayahi
Rittammal, Jerald Beno Thampiraj
WIRELESS PERSONAL COMMUNICATIONS, 2024, 135 (04) : 2103 - 2130
[35] Parallel Processing of Big Data using Power Iteration Clustering over MapReduce
Jayalatchumy, D.
Thambidurai, P.
Alamelu, A. Vasumathi
2014 WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT 2014), 2014, : 176 - 178
[36] A MapReduce-Based ELM for Regression in Big Data
Wu, B.
Yan, T. H.
Xu, X. S.
He, B.
Li, W. H.
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2016, 2016, 9937 : 164 - 173
[37] Study on Cloud Storage based on the MapReduce for Big Data
Huang Yi
Ma Xinqiang
Zhang Yongdan
Liu Youyuan
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON MECHATRONICS, ELECTRONIC, INDUSTRIAL AND CONTROL ENGINEERING, 2015, 8 : 1601 - 1605
[38] Efficient Distributed Density Peaks for Clustering Large Data Sets in MapReduce
Zhang, Yanfeng
Chen, Shimin
Yu, Ge
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (12) : 3218 - 3230
[39] Challenges for MapReduce in Big Data
Grolinger, Katarina
Hayes, Michael
Higashino, Wilson A.
L'Heureux, Alexandra
Allison, David S.
Capretz, Miriam A. M.
2014 IEEE WORLD CONGRESS ON SERVICES (SERVICES), 2014, : 182 - 189
[40] Urban Point Cloud Mining Based on Density Clustering and MapReduce
Aljumaily, Harith
Laefer, Debra F.
Cuadra, Dolores
JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2017, 31 (05)

← 1 2 3 4 5 →