A Parallel K-Medoids Algorithm for Clustering based on MapReduce

被引:0
|
作者
Shafiq, M. Omair [1 ]
Torunski, Eric [1 ]
机构
[1] Carleton Univ, Sch Informat Technol, Ottawa, ON, Canada
来源
2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016) | 2016年
关键词
Clustering; K-Medoids; Big Data; MapReduce;
D O I
10.1109/ICMLA.2016.196
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the most important machine learning techniques include clustering of data into different clusters or categories. There are several decent algorithms and techniques that exist to perform clustering on small to medium scale data. In the era of Big Data and with applications being large-scale and data-intensive in nature, there is a significant increment in volume, variety and velocity of data in the form of log events produced by such applications. This makes the task of clustering of huge amounts of data more challenging and limited. In this paper, we present our approach of a parallel K-Medoids clustering algorithm based on MapReduce paradigm to be able to perform clustering on large-scale of data. We have kept our solution simple and feasible to be used to handle huge volume, variety and velocity of data. Another key uniqueness in our proposed algorithm is that it can achieve parallelism independent of the number of k clusters to be formed, unlike other related approaches. We have tested our algorithm on large amounts of data and on a real-life case-study.
引用
收藏
页码:502 / 507
页数:6
相关论文
共 50 条
  • [41] Novel Clustering Method Based on K-Medoids and Mobility Metric
    Hamzaoui, Y.
    Amnai, M.
    Choukri, A.
    Fakhri, Y.
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2018, 5 (01): : 29 - 33
  • [42] Parallel M-tree Based on Declustering Metric Objects using K-medoids Clustering
    Qiu, Chu
    Lu, Yongquan
    Gao, Pengdong
    Wang, Jintao
    Lv, Rui
    PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE (DCABES 2010), 2010, : 206 - 210
  • [43] A Hybrid Heuristic for the k-medoids Clustering Problem
    Nascimento, Maria C. V.
    Toledo, Franklina M. B.
    de Carvalho, Andre C. P. L. F.
    PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2012, : 417 - 424
  • [44] Clustering Uncertain Data Via K-Medoids
    Gullo, Francesco
    Ponti, Giovanni
    Tagarelli, Andrea
    SCALABLE UNCERTAINTY MANAGEMENT, SUM 2008, 2008, 5291 : 229 - 242
  • [45] K-medoids Method based on Divergence for Uncertain Data Clustering
    Zhou, Jin
    Pan, Yuqi
    Chen, C. L. Philip
    Wang, Dong
    Han, Shiyuan
    2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 2671 - 2674
  • [46] Fuzzy kernel K-medoids clustering algorithm for uncertain data objects
    Tavakkol, Behnam
    Son, Youngdoo
    PATTERN ANALYSIS AND APPLICATIONS, 2021, 24 (03) : 1287 - 1302
  • [47] Jensen Shannon Divergence-Based k-Medoids Clustering
    Kingetsu, Yuto
    Hamasuna, Yukihiro
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2021, 25 (02) : 226 - 233
  • [48] Rough K-Medoids Clustering using GAs
    Lingras, Pawan
    PROCEEDINGS OF THE 8TH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS, 2009, : 315 - 319
  • [49] The application of K-medoids and PAM to the clustering of rules
    Reynolds, AP
    Richards, G
    Rayward-Smith, VJ
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 173 - 178
  • [50] 基于MapReduce的K-Medoids并行算法
    张雪萍
    龚康莉
    赵广才
    计算机应用, 2013, 33 (04) : 1023 - 1025+1035