K-means algorithms for functional data

被引：33

作者：

Lopez Garcia, Maria Luz ^{[1
]}

Garcia-Rodenas, Ricardo ^{[1
]}

Gonzalez Gomez, Antonia ^{[2
]}

机构：

[1] Univ Castilla La Mancha, Escuela Super Informat, Dept Matemat, Ciudad Real 28012, Spain

[2] Univ Politecn Madrid, ET Super Ingn Montes, Dept Matemat Aplicada Recursos Nat, E-28040 Madrid, Spain

来源：

NEUROCOMPUTING | 2015年 / 151卷

关键词：

Functional data; K-means; Reproducing Kernel Hilbert Space; Tikhonov regularization theory; Dimensionality reduction; STOCHASTIC-PROCESSES; KERNEL;

D O I：

10.1016/j.neucom.2014.09.048

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cluster analysis of functional data considers that the objects on which you want to perform a taxonomy are functions f : X subset of R-P bar right arrow R and the available information about each object is a sample in a finite set of points f (n) = {(xi,y,) is an element of X x R}(n)(i) (=1). The aim is to infer the meaningful groups by working explicitly with its infinite-dimensional nature. In this paper the use of K-means algorithms to solve this problem is analysed. A comparative study of three K-means algorithms has been conducted. The K-means algorithm for raw data, a kernel K-means algorithm for raw data and a K-means algorithm using two distances for functional data are tested. These distances, called dv(n) and d(phi), are based on projections onto Reproducing Kernel Hilbert Spaces (RKHS) and Tikhonov regularization theory. Although it is shown that both distances are equivalent, they lead to two different strategies to reduce the dimensionality of the data. In the case of dv distance the most suitable strategy is Johnson-Lindenstrauss random projections. The dimensionality reduction for d(phi) is based on spectral methods. A key aspect that has been analysed is the effect of the sampling {Xi}(n)(i=1) on the K-means algorithm performance. In the numerical study an ex professo example is given to show that if the sampling is not uniform in x, then a K-means algorithm that ignores the functional nature of the data can reduce its performance. It is numerically shown that the original K-means algorithm and that suggested here lead to similar performance in the examples when X is uniformly sampled, but the computational cost when working with the original set of observations is higher than the K-means algorithms based on d(phi) or dv(n) as they use strategies to reduce the dimensionality of the data. The numerical tests are completed with a case study to analyse what kind of problem the K-means algorithm for functional data must face. (C) 2014 Elsevier B.V. All rights reserved.

引用

页码：231 / 245

页数：15

共 50 条

[1] Crisp and fuzzy k-means clustering algorithms for multivariate functional data
Tokushige, Shuichi
Yadohisa, Hiroshi
Inada, Koichi
[J]. COMPUTATIONAL STATISTICS, 2007, 22 (01) : 1 - 16
[2] Crisp and fuzzy k-means clustering algorithms for multivariate functional data
Shuichi Tokushige
Hiroshi Yadohisa
Koichi Inada
[J]. Computational Statistics, 2007, 22 : 1 - 16
[3] K-means - a fast and efficient K-means algorithms
[J]. Nguyen, Cuong Duc (nguyenduccuong@tdt.edu.vn), 2018, Inderscience Publishers, 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (11)
[4] Impartial trimmed k-means for functional data
Cuesta-Albertos, Juan Antonio
Fraiman, Ricardo
[J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (10) : 4864 - 4877
[5] Immune K-means and negative selection algorithms for data analysis
Bereta, Michal
Burczynski, Tadeusz
[J]. INFORMATION SCIENCES, 2009, 179 (10) : 1407 - 1425
[6] A Modified K-means Algorithms - Bi-Level K-Means Algorithm
Yu, Shyr-Shen
Chu, Shao-Wei
Wang, Ching-Lin
Chan, Yung-Kuan
Chuang, Chia-Yi
[J]. PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON SOFT COMPUTING IN INFORMATION COMMUNICATION TECHNOLOGY, 2014, : 10 - 13
[7] Empirical Evaluation of K-Means, Bisecting K-Means, Fuzzy C-Means and Genetic K-Means Clustering Algorithms
Banerjee, Shreya
Choudhary, Ankit
Pal, Somnath
[J]. 2015 IEEE INTERNATIONAL WIE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (WIECON-ECE), 2015, : 172 - 176
[8] Operational analysis of k-medoids and k-means algorithms on noisy data
Manjoro, Wellington Simbarashe
Dhakar, Mradul
Chaurasia, Brijesh Kumar
[J]. 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), VOL. 1, 2016, : 1500 - 1505
[9] Robust K-Median and K-Means Clustering Algorithms for Incomplete Data
Li, Jinhua
Song, Shiji
Zhang, Yuli
Zhou, Zhen
[J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016
[10] A note on constrained k-means algorithms
Ng, MK
[J]. PATTERN RECOGNITION, 2000, 33 (03) : 515 - 519

← 1 2 3 4 5 →