Fuzzy Based Clustering Algorithms to Handle Big Data with Implementation on Apache Spark

被引：13

作者：

Bharill, Neha ^{[1
]}

Tiwari, Aruna ^{[1
]}

Malviya, Aayushi ^{[1
]}

机构：

[1] Indian Inst Technol, Dept Comp Sci & Engn, Indore, India

来源：

PROCEEDINGS 2016 IEEE SECOND INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2016) | 2016年

关键词：

Fuzzy Clustering; Partitional clustering; Apache Spark; Big Data; Iterative Algorithms; COMPLEXITY;

D O I：

10.1109/BigDataService.2016.34

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the advancement in technology, a huge amount of data containing useful information, called Big Data, is generated on a daily basis. For processing such tremendous volume of data, there is a need of Big Data frameworks such as Hadoop MapReduce, Apache Spark etc. Among these, Apache Spark performs up to 100 times faster than conventional frameworks like Hadoop Mapreduce. For the effective analysis and interpretation of this data, scalable Machine Learning methods are required to overcome the space and time bottlenecks. Partitional clustering algorithms are widely adopted by researchers for clustering large datasets due to their low computational requirements. Thus, we focus on the design of partitional clustering algorithm and its implementation on Apache Spark. In this paper, we propose a partitional based clustering algorithm called Scalable Random Sampling with Iterative Optimization Fuzzy c-Means algorithm (SRSIO-FCM) which is implemented on Apache Spark to handle the challenges associated with Big Data Clustering. Experimentation is performed on several big datasets to show the effectiveness of SRSIO-FCM in comparison with a proposed scalable version of the Literal Fuzzy c-Means (LFCM) called SLFCM implemented on Apache Spark. The comparative results are reported in terms of value of F-measure, ARI, Objective function, Run-time and Scalability. The reported results show the great potential of SRSIO-FCM for Big Data clustering.

引用

页码：95 / 104

页数：10

共 50 条

[1] Research on Visual Machine Learning Algorithms Based on Apache Spark in Big Data Environment
Wang, Jialin
[J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 144 - 144
[2] Testing of algorithms for anomaly detection in Big data using apache spark
Lighari, Sheeraz Niaz
Hussain, Dil Muhammad Akbar
[J]. 2017 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2017, : 97 - 100
[3] A big data driven distributed density based hesitant fuzzy clustering using Apache spark with application to gene expression microarray
Hosseini, Behrooz
Kiani, Kourosh
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2019, 79 : 100 - 113
[4] Big data analytics on Apache Spark
Salloum S.
Dautov R.
Chen X.
Peng P.X.
Huang J.Z.
[J]. International Journal of Data Science and Analytics, 2016, 1 (3-4) : 145 - 164
[5] An Apache Spark Implementation for Text Document Clustering
Dritsas, Elias
Trigka, Maria
Vonitsanos, Gerasimos
Kanavos, Andreas
Mylonas, Phivos
[J]. 2022 17TH INTERNATIONAL WORKSHOP ON SEMANTIC AND SOCIAL MEDIA ADAPTATION & PERSONALIZATION (SMAP 2022), 2022, : 50 - 55
[6] Scalable Implementation of Dependence Clustering in Apache Spark
Ivannikova, Elena
[J]. PROCEEDINGS OF THE 2017 EVOLVING AND ADAPTIVE INTELLIGENT SYSTEMS (EAIS), 2017,
[7] Exhaustive search algorithms to mine subgroups on Big Data using Apache Spark
Padillo F.
Luna J.M.
Ventura S.
[J]. Progress in Artificial Intelligence, 2017, 6 (2) : 145 - 158
[8] A Robust Distributed Big Data Clustering-based on Adaptive Density Partitioning using Apache Spark
Hosseini, Behrooz
Kiani, Kourosh
[J]. SYMMETRY-BASEL, 2018, 10 (08):
[9] Distributed fuzzy clustering algorithm for mixed-mode data in Apache SPARK
Akram, Abdul Wahab
Alamgir, Zareen
[J]. JOURNAL OF BIG DATA, 2022, 9 (01)
[10] Distributed fuzzy clustering algorithm for mixed-mode data in Apache SPARK
Abdul Wahab Akram
Zareen Alamgir
[J]. Journal of Big Data, 9

← 1 2 3 4 5 →