Fuzzy Based Clustering Algorithms to Handle Big Data with Implementation on Apache Spark

被引:13
|
作者
Bharill, Neha [1 ]
Tiwari, Aruna [1 ]
Malviya, Aayushi [1 ]
机构
[1] Indian Inst Technol, Dept Comp Sci & Engn, Indore, India
关键词
Fuzzy Clustering; Partitional clustering; Apache Spark; Big Data; Iterative Algorithms; COMPLEXITY;
D O I
10.1109/BigDataService.2016.34
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the advancement in technology, a huge amount of data containing useful information, called Big Data, is generated on a daily basis. For processing such tremendous volume of data, there is a need of Big Data frameworks such as Hadoop MapReduce, Apache Spark etc. Among these, Apache Spark performs up to 100 times faster than conventional frameworks like Hadoop Mapreduce. For the effective analysis and interpretation of this data, scalable Machine Learning methods are required to overcome the space and time bottlenecks. Partitional clustering algorithms are widely adopted by researchers for clustering large datasets due to their low computational requirements. Thus, we focus on the design of partitional clustering algorithm and its implementation on Apache Spark. In this paper, we propose a partitional based clustering algorithm called Scalable Random Sampling with Iterative Optimization Fuzzy c-Means algorithm (SRSIO-FCM) which is implemented on Apache Spark to handle the challenges associated with Big Data Clustering. Experimentation is performed on several big datasets to show the effectiveness of SRSIO-FCM in comparison with a proposed scalable version of the Literal Fuzzy c-Means (LFCM) called SLFCM implemented on Apache Spark. The comparative results are reported in terms of value of F-measure, ARI, Objective function, Run-time and Scalability. The reported results show the great potential of SRSIO-FCM for Big Data clustering.
引用
收藏
页码:95 / 104
页数:10
相关论文
共 50 条
  • [1] Research on Visual Machine Learning Algorithms Based on Apache Spark in Big Data Environment
    Wang, Jialin
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 144 - 144
  • [2] Testing of algorithms for anomaly detection in Big data using apache spark
    Lighari, Sheeraz Niaz
    Hussain, Dil Muhammad Akbar
    [J]. 2017 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2017, : 97 - 100
  • [3] A big data driven distributed density based hesitant fuzzy clustering using Apache spark with application to gene expression microarray
    Hosseini, Behrooz
    Kiani, Kourosh
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2019, 79 : 100 - 113
  • [4] Big data analytics on Apache Spark
    Salloum S.
    Dautov R.
    Chen X.
    Peng P.X.
    Huang J.Z.
    [J]. International Journal of Data Science and Analytics, 2016, 1 (3-4) : 145 - 164
  • [5] An Apache Spark Implementation for Text Document Clustering
    Dritsas, Elias
    Trigka, Maria
    Vonitsanos, Gerasimos
    Kanavos, Andreas
    Mylonas, Phivos
    [J]. 2022 17TH INTERNATIONAL WORKSHOP ON SEMANTIC AND SOCIAL MEDIA ADAPTATION & PERSONALIZATION (SMAP 2022), 2022, : 50 - 55
  • [6] Scalable Implementation of Dependence Clustering in Apache Spark
    Ivannikova, Elena
    [J]. PROCEEDINGS OF THE 2017 EVOLVING AND ADAPTIVE INTELLIGENT SYSTEMS (EAIS), 2017,
  • [7] Exhaustive search algorithms to mine subgroups on Big Data using Apache Spark
    Padillo F.
    Luna J.M.
    Ventura S.
    [J]. Progress in Artificial Intelligence, 2017, 6 (2) : 145 - 158
  • [8] A Robust Distributed Big Data Clustering-based on Adaptive Density Partitioning using Apache Spark
    Hosseini, Behrooz
    Kiani, Kourosh
    [J]. SYMMETRY-BASEL, 2018, 10 (08):
  • [9] Distributed fuzzy clustering algorithm for mixed-mode data in Apache SPARK
    Akram, Abdul Wahab
    Alamgir, Zareen
    [J]. JOURNAL OF BIG DATA, 2022, 9 (01)
  • [10] Distributed fuzzy clustering algorithm for mixed-mode data in Apache SPARK
    Abdul Wahab Akram
    Zareen Alamgir
    [J]. Journal of Big Data, 9