Two-Stage Clustering with k-Means Algorithm

被引:0
|
作者
Salman, Raied [1 ]
Kecman, Vojislav [1 ]
Li, Qi [1 ]
Strack, Robert [1 ]
Test, Erick [1 ]
机构
[1] Virginia Commonwealth Univ, Dept Comp Sci, Richmond, VA 23284 USA
关键词
Data Mining; Clustering; k-means algorithm; Distance Calculation;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
k-means has recently been recognized as one of the best algorithms for clustering unsupervised data. Since the k-means depends mainly on distance calculation between all data points and the centers then the cost will be high when the size of the dataset is big (for example more than 500MG points). We suggested a two stage algorithm to reduce the cost of calculation for huge datasets. The first stage is fast calculation depending on small portion of the data to produce the best location of the centers. The second stage is the slow calculation in which the initial centers are taken from the first stage. The fast and slow stages are representing the movement of the centers. In the slow stage the whole dataset can be used to get the exact location of the centers. The cost of the calculation of the fast stage is very low due to the small size of the data chosen. The cost of the calculation of the slow stage is also small due to the low number of iterations.
引用
收藏
页码:110 / 122
页数:13
相关论文
共 50 条
  • [21] An Enhancement of K-means Clustering Algorithm
    Gu, Jirong
    Zhou, Jieming
    Chen, Xianwei
    2009 INTERNATIONAL CONFERENCE ON BUSINESS INTELLIGENCE AND FINANCIAL ENGINEERING, PROCEEDINGS, 2009, : 237 - 240
  • [22] Improved Algorithm for the k-means Clustering
    Zhang, Sheng
    Wang, Shouqiang
    PROCEEDINGS OF THE 10TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA 2012), 2012, : 4717 - 4720
  • [23] Adaptive K-Means clustering algorithm
    Chen, Hailin
    Wu, Xiuqing
    Hu, Junhua
    MIPPR 2007: PATTERN RECOGNITION AND COMPUTER VISION, 2007, 6788
  • [24] k*-means:: A new generalized k-means clustering algorithm
    Cheung, YM
    PATTERN RECOGNITION LETTERS, 2003, 24 (15) : 2883 - 2893
  • [25] Data-driven modeling of general damping systems by k-means clustering and two-stage regression
    Guo, Jia
    Wang, Li
    Fukuda, Iori
    Ikago, Kohju
    MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2022, 167
  • [26] K*-Means: An Effective and Efficient K-means Clustering Algorithm
    Qi, Jianpeng
    Yu, Yanwei
    Wang, Lihong
    Liu, Jinglei
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCES ON BIG DATA AND CLOUD COMPUTING (BDCLOUD 2016) SOCIAL COMPUTING AND NETWORKING (SOCIALCOM 2016) SUSTAINABLE COMPUTING AND COMMUNICATIONS (SUSTAINCOM 2016) (BDCLOUD-SOCIALCOM-SUSTAINCOM 2016), 2016, : 242 - 249
  • [27] Soil data clustering by using K-means and fuzzy K-means algorithm
    Hot, Elma
    Popovic-Bugarin, Vesna
    2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 890 - 893
  • [28] A Modified K-means Algorithm - Two-Layer K-means Algorithm
    Liu, Chen-Chung
    Chu, Shao-Wei
    Chan, Yung-Kuan
    Yu, Shyr-Shen
    2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014), 2014, : 447 - 450
  • [29] IMPROVEMENT IN K-MEANS CLUSTERING ALGORITHM FOR DATA CLUSTERING
    Rajeswari, K.
    Acharya, Omkar
    Sharma, Mayur
    Kopnar, Mahesh
    Karandikar, Kiran
    1ST INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION ICCUBEA 2015, 2015, : 367 - 369
  • [30] On K-means Data Clustering Algorithm with Genetic Algorithm
    Kapil, Shruti
    Chawla, Meenu
    Ansari, Mohd Dilshad
    2016 FOURTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2016, : 202 - 206