Speeding-up the prototype based kernel k-means clustering method for large data sets

被引:0
|
作者
Sarma, T. Hitendra [1 ]
Viswanath, P. [2 ]
Negi, Atul [3 ]
机构
[1] Srinivasa Ramanujan Inst Technol, Anantapur, AP, India
[2] Indian Inst Informat Technol Chittoor, Sri City, AP, India
[3] Univ Hyderabad, Hyderabad, Telangana, India
关键词
Data mining; kernel k-means clustering method;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Kernel k-means is seen as a non-linear extension of the k-means clustering method, with good performance in identifying non-isotropic and linearly inseparable clusters. However space and time requirement of kernel k-means is expensive with O (n(2)) complexity. Present applications with large in-memory computations make this method insuitable for large data sets. Recently, a simple prototype based hybrid approach speedsup kernel k-means method for large data sets [1]. The time complexity of this method is (n+ p(2)), where.. is the number of prototypes. Each prototype is a representative pattern of a group-let of size (threshold) T. The time complexity of this method not only depends upon p but which in turn depends on clustering threshold. Increasing the threshold value can decrease the number of prototypes p, but, quality of the clustering result might suffer. Hence fixing the appropriate value of the threshold is the major challenge in this approach. This paper, presents a solution to this problem, by allowing T to vary, depending on the location of the group-let in the space. Intuitively, If the grouplet is close to a cluster center (and away from others) then its size could be large, but if it is lying somewhere between two cluster centers, then its size should be small. It is experimentally shown that this reduces the clustering time and also increases the clustering accuracy. The presented method is a suitable one for large data sets like in data mining.
引用
收藏
页码:1903 / 1910
页数:8
相关论文
共 50 条
  • [1] Speeding-up the kernel k-means clustering method: A prototype based hybrid approach
    Sarma, T. Hitendra
    Viswanath, P.
    Reddy, B. Eswara
    [J]. PATTERN RECOGNITION LETTERS, 2013, 34 (05) : 564 - 573
  • [2] Speeding-Up the K-Means Clustering Method: A Prototype Based Approach
    Sarma, T. Hitendra
    Viswanath, P.
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 56 - 61
  • [3] Genetic Sampling k-means for Clustering Large Data Sets
    Luchi, Diego
    Santos, Willian
    Rodrigues, Alexandre
    Varejao, Flavio Miguel
    [J]. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2015, 2015, 9423 : 691 - 698
  • [4] A Kernel K-means Clustering Method for Symbolic Interval Data
    Costa, Anderson F. B. F.
    Pimentel, Bruno A.
    de Souza, Renata M. C. R.
    [J]. 2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [5] A large scale clustering scheme for kernel K-Means
    Zhang, R
    Rudnicky, AI
    [J]. 16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITON, VOL IV, PROCEEDINGS, 2002, : 289 - 292
  • [6] Optimized Data Fusion for Kernel k-Means Clustering
    Yu, Shi
    Tranchevent, Leon-Charles
    Liu, Xinhai
    Glanzel, Wolfgang
    Suykens, Johan A. K.
    De Moor, Bart
    Moreau, Yves
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (05) : 1031 - 1039
  • [7] Weighted kernel K-means for clustering spatial data
    Faculty of Computer Science and Information Systems, University Technology Malaysia, Skudai 81310 Johor, Malaysia
    [J]. WSEAS Trans. Syst., 2006, 6 (1301-1308):
  • [8] Single pass kernel k-means clustering method
    T HITENDRA SARMA
    P VISWANATH
    B ESWARA REDDY
    [J]. Sadhana, 2013, 38 : 407 - 419
  • [9] Single pass kernel k-means clustering method
    Sarma, T. Hitendra
    Viswanath, P.
    Reddy, B. Eswara
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2013, 38 (03): : 407 - 419
  • [10] Extensions to the k-means algorithm for clustering large data sets with categorical values
    Huang, ZX
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (03) : 283 - 304