Clustering Big Data Based on Distributed Fuzzy K-Medoids: An Application to Geospatial Informatics

被引:5
|
作者
Madbouly, Magda M. [1 ]
Darwish, Saad M. [1 ]
Bagi, Noha A. [2 ]
Osman, Mohamed A. [3 ]
机构
[1] Alexandria Univ, Inst Grad Studies & Res, Dept Informat Technol, Alexandria 21526, Egypt
[2] Alexandria Water Co, Alexandria 21581, Egypt
[3] Higher Inst Management Informat Technol, Management Informat Syst Dept, Kafr Al Sheikh 33511, Egypt
关键词
Clustering algorithms; Big Data; Geospatial analysis; Heuristic algorithms; Partitioning algorithms; Distributed databases; Scalability; Geospatial informatics; big data clustering; dynamic clustering; Apache Spark; fuzzy K-medoids;
D O I
10.1109/ACCESS.2022.3149548
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The advent of big data related to spatial position knowledge, called geospatial big data, provides us with opportunities to recognize the urban environment. Existing database processing methods are inadequate to rapidly provide reliable results in a geospatial big data context due to the need for defining approximation "measures" and the increasing execution time for the queries. The clustering method yields the functional effects. How to scale and accelerate clustering algorithms while maintaining high clustering efficiency, on the other hand, remains a significant challenge. The paper's primary contribution is the introduction of a modified hierarchical distributed k-medoid clustering method that is specific to spatial query analysis for big data. To improve the efficiency of the k-medoid algorithm and obtain more precise clusters, the suggested model utilizes the Fuzzy k-Medoids method to overcome outliers in the spatial data set and to deal with data uncertainty. The method is complex in nature since it is not predicated on the number of right clusters. The proposed model is divided into two phases: the first step creates local clusters based on a portion of the entire dataset; this stage makes extensive use of the parallelism paradigm provided by the Apache Spark framework; and the second phase aggregates the local clusters to produce compact and reliable final clusters. The proposed model greatly reduces the amount of knowledge shared during the aggregation process and automatically produces the appropriate number of clusters based on the dataset characteristics. The results show that the proposed model outperforms the traditional K-medoids in terms of accuracy of obtained centers in big data applications.
引用
收藏
页码:20926 / 20936
页数:11
相关论文
共 50 条
  • [1] Convex fuzzy k-medoids clustering
    Pinheiro, Daniel N.
    Aloise, Daniel
    Blanchard, Simon J.
    [J]. FUZZY SETS AND SYSTEMS, 2020, 389 : 66 - 92
  • [2] A K-medoids Based Clustering Scheme with an Application to Document Clustering
    Onan, Aytug
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2017, : 354 - 359
  • [3] Kernel Based K-Medoids for Clustering Data with Uncertainty
    Yang, Baoguo
    Zhang, Yang
    [J]. ADVANCED DATA MINING AND APPLICATIONS, ADMA 2010, PT I, 2010, 6440 : 246 - 253
  • [4] Fuzzy kernel K-medoids clustering algorithm for uncertain data objects
    Behnam Tavakkol
    Youngdoo Son
    [J]. Pattern Analysis and Applications, 2021, 24 : 1287 - 1302
  • [5] Fuzzy kernel K-medoids clustering algorithm for uncertain data objects
    Tavakkol, Behnam
    Son, Youngdoo
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2021, 24 (03) : 1287 - 1302
  • [6] Clustering Uncertain Data Via K-Medoids
    Gullo, Francesco
    Ponti, Giovanni
    Tagarelli, Andrea
    [J]. SCALABLE UNCERTAINTY MANAGEMENT, SUM 2008, 2008, 5291 : 229 - 242
  • [7] The application of K-medoids and PAM to the clustering of rules
    Reynolds, AP
    Richards, G
    Rayward-Smith, VJ
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 173 - 178
  • [8] K-medoids Method based on Divergence for Uncertain Data Clustering
    Zhou, Jin
    Pan, Yuqi
    Chen, C. L. Philip
    Wang, Dong
    Han, Shiyuan
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 2671 - 2674
  • [9] Application of the k-medoids Partitioning Algorithm for Clustering of Time Series Data
    Radovanovic, Ana
    Ye, Xinlin
    Milanovic, Jovica, V
    Milosavljevic, Nina
    Storchi, Riccardo
    [J]. 2020 IEEE PES INNOVATIVE SMART GRID TECHNOLOGIES EUROPE (ISGT-EUROPE 2020): SMART GRIDS: KEY ENABLERS OF A GREEN POWER SYSTEM, 2020, : 645 - 649
  • [10] K-Medoids Clustering and Fuzzy Sets for Isolation Forest
    Karczmarek, Pawel
    Kiersztyn, Adam
    Pedrycz, Witold
    Badurowicz, Marcin
    Czerwinski, Dariusz
    Montusiewicz, Jerzy
    [J]. IEEE CIS INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS 2021 (FUZZ-IEEE), 2021,