K-medoids Clustering Based on MapReduce and Optimal Search of Medoids

被引:0
|
作者
Zhu, Ying-ting [1 ]
Wang, Fu-zhang [2 ]
Shan, Xing-hua [2 ]
Lv, Xiao-yan [2 ]
机构
[1] China Acad Railway Sci, Railway Technol Res Coll, Beijing, Peoples R China
[2] China Acad Railway Sci, Inst Comp Technol, Beijing, Peoples R China
来源
2014 PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE 2014) | 2014年
关键词
MapReduce; k-medoids; parallel algorithm; cluster analysis; data mining;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
When there are noises and outliers in the data, the traditional k-medoids algorithm has good robustness, however, that algorithm is only suitable for medium and small data set for its complex calculation. MapReduce is a programming model for processing mass data and suitable for parallel computing of big data. Therefore, this paper proposed an improved algorithm based on MapReduce and optimal search of medoids to cluster big data. Firstly, according to the basic properties of triangular geometry, this paper reduced calculation of distances among data elements to help search medoids quickly and reduce the calculation complexity of k-medoids. Secondly, according to the working principle of MapReduce, Map function is responsible for calculating the distances between each data element and medoids, and assigns data elements to their clusters; Reduce function will check for the results from Map function, search new medoids by the optimal search strategy of medoids again, and return new results to Map function in the next MapReduce process. The experiment results showed that our algorithm in this paper has high efficiency and good effectiveness.
引用
收藏
页码:573 / 577
页数:5
相关论文
共 50 条
  • [21] Rough K-Medoids Clustering using GAs
    Lingras, Pawan
    PROCEEDINGS OF THE 8TH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS, 2009, : 315 - 319
  • [22] The application of K-medoids and PAM to the clustering of rules
    Reynolds, AP
    Richards, G
    Rayward-Smith, VJ
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 173 - 178
  • [23] An Efficient Density based Improved K-Medoids Clustering algorithm
    Pratap, Raghuvira A.
    Vani, K. Suvarna
    Devi, J. Rama
    Rao, K. Nageswara
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2011, 2 (06) : 49 - 54
  • [24] A Bisecting K-Medoids clustering Algorithm Based on Cloud Model
    Sun, D.
    Fei, H.
    Li, Q.
    IFAC PAPERSONLINE, 2018, 51 (11): : 308 - 315
  • [25] Novel Clustering Method Based on K-Medoids and Mobility Metric
    Hamzaoui, Y.
    Amnai, M.
    Choukri, A.
    Fakhri, Y.
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2018, 5 (01): : 29 - 33
  • [26] K-medoids Method based on Divergence for Uncertain Data Clustering
    Zhou, Jin
    Pan, Yuqi
    Chen, C. L. Philip
    Wang, Dong
    Han, Shiyuan
    2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 2671 - 2674
  • [27] A K-medoids based Clustering Algorithm for Wireless Sensor Networks
    Wang, Jin
    Wang, Kai
    Niu, Junming
    Liu, Wei
    2018 INTERNATIONAL WORKSHOP ON ADVANCED IMAGE TECHNOLOGY (IWAIT), 2018,
  • [28] Jensen Shannon Divergence-Based k-Medoids Clustering
    Kingetsu, Yuto
    Hamasuna, Yukihiro
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2021, 25 (02) : 226 - 233
  • [29] An improved K-medoids algorithm based on step increasing and optimizing medoids
    Yu, Donghua
    Liu, Guojun
    Guo, Maozu
    Liu, Xiaoyan
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 92 : 464 - 473
  • [30] Near-optimal large-scale k-medoids clustering
    Ushakov, Anton V.
    Vasilyev, Igor
    INFORMATION SCIENCES, 2021, 545 : 344 - 362