A Parallel K-Medoids Algorithm for Clustering based on MapReduce

被引:0
|
作者
Shafiq, M. Omair [1 ]
Torunski, Eric [1 ]
机构
[1] Carleton Univ, Sch Informat Technol, Ottawa, ON, Canada
来源
2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016) | 2016年
关键词
Clustering; K-Medoids; Big Data; MapReduce;
D O I
10.1109/ICMLA.2016.196
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the most important machine learning techniques include clustering of data into different clusters or categories. There are several decent algorithms and techniques that exist to perform clustering on small to medium scale data. In the era of Big Data and with applications being large-scale and data-intensive in nature, there is a significant increment in volume, variety and velocity of data in the form of log events produced by such applications. This makes the task of clustering of huge amounts of data more challenging and limited. In this paper, we present our approach of a parallel K-Medoids clustering algorithm based on MapReduce paradigm to be able to perform clustering on large-scale of data. We have kept our solution simple and feasible to be used to handle huge volume, variety and velocity of data. Another key uniqueness in our proposed algorithm is that it can achieve parallelism independent of the number of k clusters to be formed, unlike other related approaches. We have tested our algorithm on large amounts of data and on a real-life case-study.
引用
收藏
页码:502 / 507
页数:6
相关论文
共 50 条
  • [21] A K-medoids Based Clustering Scheme with an Application to Document Clustering
    Onan, Aytug
    2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2017, : 354 - 359
  • [22] Convex fuzzy k-medoids clustering
    Pinheiro, Daniel N.
    Aloise, Daniel
    Blanchard, Simon J.
    FUZZY SETS AND SYSTEMS, 2020, 389 : 66 - 92
  • [23] k-MM: A Hybrid Clustering Algorithm Based on k-Means and k-Medoids
    Drias, Habiba
    Cherif, Nadjib Fodil
    Kechid, Amine
    ADVANCES IN NATURE AND BIOLOGICALLY INSPIRED COMPUTING, 2016, 419 : 37 - 48
  • [24] Clustering Time Series with k-Medoids Based Algorithms
    Holder, Christopher
    Guijo-Rubio, David
    Bagnall, Anthony
    ADVANCED ANALYTICS AND LEARNING ON TEMPORAL DATA, AALTD 2023, 2023, 14343 : 39 - 55
  • [25] Quantum k-medoids algorithm using parallel amplitude estimation
    Li, Yong -Mei
    Liu, Hai -Ling
    Pan, Shi-Jie
    Qin, Su-Juan
    Gao, Fei
    Sun, Dong-Xu
    Wen, Qiao-Yan
    PHYSICAL REVIEW A, 2023, 107 (02)
  • [26] Clustering of Uncertain Load Model Parameters with K-medoids Algorithm
    Zhang, Xinran
    Hill, David J.
    2018 IEEE POWER & ENERGY SOCIETY GENERAL MEETING (PESGM), 2018,
  • [27] Privacy preserving k-medoids clustering
    Zhan, Justin
    2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 3570 - 3573
  • [28] An improved K-medoids algorithm based on step increasing and optimizing medoids
    Yu, Donghua
    Liu, Guojun
    Guo, Maozu
    Liu, Xiaoyan
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 92 : 464 - 473
  • [29] A parallel heuristic for a k-medoids clustering problem with unfixed number of clusters
    Ushakov, Anton V.
    Vasilyev, Igor
    2019 42ND INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2019, : 1116 - 1120
  • [30] Kernel Based K-Medoids for Clustering Data with Uncertainty
    Yang, Baoguo
    Zhang, Yang
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2010, PT I, 2010, 6440 : 246 - 253