K-medoids Clustering Based on MapReduce and Optimal Search of Medoids

被引:0
|
作者
Zhu, Ying-ting [1 ]
Wang, Fu-zhang [2 ]
Shan, Xing-hua [2 ]
Lv, Xiao-yan [2 ]
机构
[1] China Acad Railway Sci, Railway Technol Res Coll, Beijing, Peoples R China
[2] China Acad Railway Sci, Inst Comp Technol, Beijing, Peoples R China
关键词
MapReduce; k-medoids; parallel algorithm; cluster analysis; data mining;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
When there are noises and outliers in the data, the traditional k-medoids algorithm has good robustness, however, that algorithm is only suitable for medium and small data set for its complex calculation. MapReduce is a programming model for processing mass data and suitable for parallel computing of big data. Therefore, this paper proposed an improved algorithm based on MapReduce and optimal search of medoids to cluster big data. Firstly, according to the basic properties of triangular geometry, this paper reduced calculation of distances among data elements to help search medoids quickly and reduce the calculation complexity of k-medoids. Secondly, according to the working principle of MapReduce, Map function is responsible for calculating the distances between each data element and medoids, and assigns data elements to their clusters; Reduce function will check for the results from Map function, search new medoids by the optimal search strategy of medoids again, and return new results to Map function in the next MapReduce process. The experiment results showed that our algorithm in this paper has high efficiency and good effectiveness.
引用
收藏
页码:573 / 577
页数:5
相关论文
共 50 条
  • [1] A Parallel K-Medoids Algorithm for Clustering based on MapReduce
    Shafiq, M. Omair
    Torunski, Eric
    [J]. 2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016), 2016, : 502 - 507
  • [2] Parallel K-Medoids Improved Algorithm Based on MapReduce
    Zhao, Yonghan
    Chen, Bin
    Li, Mengyu
    [J]. 2018 SIXTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD), 2018, : 18 - 23
  • [3] AN OPTIMAL SOLUTION APPROACH FOR THE K-MEDOIDS CLUSTERING BASED ON MATHMATICAL PROGRAMMING
    Huang, Changhao
    Zuo, Xiaorong
    Zhu, Chuan
    Xiao, Yiyong
    [J]. ICIM'2016: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON INDUSTRIAL MANAGEMENT, 2016, : 542 - 549
  • [4] A genetic k-medoids clustering algorithm
    Weiguo Sheng
    Xiaohui Liu
    [J]. Journal of Heuristics, 2006, 12 : 447 - 466
  • [5] Convex fuzzy k-medoids clustering
    Pinheiro, Daniel N.
    Aloise, Daniel
    Blanchard, Simon J.
    [J]. FUZZY SETS AND SYSTEMS, 2020, 389 : 66 - 92
  • [6] An improved k-medoids clustering algorithm
    Cao, Danyang
    Yang, Bingru
    [J]. 2010 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2010), VOL 3, 2010, : 132 - 135
  • [7] A genetic k-medoids clustering algorithm
    Sheng, Weiguo
    Liu, Xiaohui
    [J]. JOURNAL OF HEURISTICS, 2006, 12 (06) : 447 - 466
  • [8] Privacy preserving k-medoids clustering
    Zhan, Justin
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 3570 - 3573
  • [9] Improving the Efficiency of the K-medoids Clustering Algorithm by Getting Initial Medoids
    Perez-Ortega, Joaquin
    Almanza-Ortega, Nelva N.
    Adams-Lopez, Jessica
    Gonzalez-Garcia, Moises
    Mexicano, Adriana
    Saenz-Sanchez, Socorro
    Rodriguez-Lelis, J. M.
    [J]. RECENT ADVANCES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 1, 2017, 569 : 125 - 132
  • [10] Global Optimal K-Medoids Clustering of One Million Samples
    Ren, Jiayang
    Hua, Kaixun
    Cao, Yankai
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,