Speeding-up the prototype based kernel k-means clustering method for large data sets

被引：0

作者：

Sarma, T. Hitendra ^{[1
]}

Viswanath, P. ^{[2
]}

Negi, Atul ^{[3
]}

机构：

[1] Srinivasa Ramanujan Inst Technol, Anantapur, AP, India

[2] Indian Inst Informat Technol Chittoor, Sri City, AP, India

[3] Univ Hyderabad, Hyderabad, Telangana, India

来源：

2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2016年

关键词：

Data mining; kernel k-means clustering method;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Kernel k-means is seen as a non-linear extension of the k-means clustering method, with good performance in identifying non-isotropic and linearly inseparable clusters. However space and time requirement of kernel k-means is expensive with O (n(2)) complexity. Present applications with large in-memory computations make this method insuitable for large data sets. Recently, a simple prototype based hybrid approach speedsup kernel k-means method for large data sets [1]. The time complexity of this method is (n+ p(2)), where.. is the number of prototypes. Each prototype is a representative pattern of a group-let of size (threshold) T. The time complexity of this method not only depends upon p but which in turn depends on clustering threshold. Increasing the threshold value can decrease the number of prototypes p, but, quality of the clustering result might suffer. Hence fixing the appropriate value of the threshold is the major challenge in this approach. This paper, presents a solution to this problem, by allowing T to vary, depending on the location of the group-let in the space. Intuitively, If the grouplet is close to a cluster center (and away from others) then its size could be large, but if it is lying somewhere between two cluster centers, then its size should be small. It is experimentally shown that this reduces the clustering time and also increases the clustering accuracy. The presented method is a suitable one for large data sets like in data mining.

引用

页码：1903 / 1910

页数：8

共 50 条

[1] Speeding-up the kernel k-means clustering method: A prototype based hybrid approach
Sarma, T. Hitendra
Viswanath, P.
Reddy, B. Eswara
[J]. PATTERN RECOGNITION LETTERS, 2013, 34 (05) : 564 - 573
[2] Speeding-Up the K-Means Clustering Method: A Prototype Based Approach
Sarma, T. Hitendra
Viswanath, P.
[J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 56 - 61
[3] Genetic Sampling k-means for Clustering Large Data Sets
Luchi, Diego
Santos, Willian
Rodrigues, Alexandre
Varejao, Flavio Miguel
[J]. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2015, 2015, 9423 : 691 - 698
[4] A Kernel K-means Clustering Method for Symbolic Interval Data
Costa, Anderson F. B. F.
Pimentel, Bruno A.
de Souza, Renata M. C. R.
[J]. 2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
[5] A large scale clustering scheme for kernel K-Means
Zhang, R
Rudnicky, AI
[J]. 16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITON, VOL IV, PROCEEDINGS, 2002, : 289 - 292
[6] Optimized Data Fusion for Kernel k-Means Clustering
Yu, Shi
Tranchevent, Leon-Charles
Liu, Xinhai
Glanzel, Wolfgang
Suykens, Johan A. K.
De Moor, Bart
Moreau, Yves
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (05) : 1031 - 1039
[7] Weighted kernel K-means for clustering spatial data
Faculty of Computer Science and Information Systems, University Technology Malaysia, Skudai 81310 Johor, Malaysia
[J]. WSEAS Trans. Syst., 2006, 6 (1301-1308):
[8] Single pass kernel k-means clustering method
T HITENDRA SARMA
P VISWANATH
B ESWARA REDDY
[J]. Sadhana, 2013, 38 : 407 - 419
[9] Single pass kernel k-means clustering method
Sarma, T. Hitendra
Viswanath, P.
Reddy, B. Eswara
[J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2013, 38 (03): : 407 - 419
[10] Extensions to the k-means algorithm for clustering large data sets with categorical values
Huang, ZX
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (03) : 283 - 304

← 1 2 3 4 5 →