Fast nearest neighbor condensation for large data sets classification

被引：125

作者：

Angiulli, Fabrizio ^{[1
]}

机构：

[1] Univ Calabria, Dipartimento Elettron Informat & Sistemat, I-87036 Cosenza, Italy

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2007年 / 19卷 / 11期

关键词：

classification; large and high-dimensional data; nearest neighbor rule; prototype selection algorithms; training-set-consistent subset;

D O I：

10.1109/TKDE.2007.190645

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work has two main objectives, namely, to introduce a novel algorithm, called the Fast Condensed Nearest Neighbor (FCNN) rule, for computing a training-set-consistent subset for the nearest neighbor decision rule and to show that condensation algorithms for the nearest neighbor rule can be applied to huge collections of data. The FCNN rule has some interesting properties: it is order independent, its worst-case time complexity is quadratic but often with a small constant prefactor, and it is likely to select points very close to the decision boundary. Furthermore, its structure allows for the triangle inequality to be effectively exploited to reduce the computational effort. The FCNN rule outperformed even here-enhanced variants of existing competence preservation methods both in terms of learning speed and learning scaling behavior and, often, in terms of the size of the model while it guaranteed the same prediction accuracy. Furthermore, it was three orders of magnitude faster than hybrid instance-based learning algorithms on the MNIST and Massachusetts Institute of Technology (MIT) Face databases and computed a model of accuracy comparable to that of methods incorporating a noise-filtering pass.

引用

页码：1450 / 1464

页数：15

共 50 条

[21] Data compression and local metrics for nearest neighbor classification
Ricci, F
Avesani, P
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1999, 21 (04) : 380 - 384
[22] Probabilistic Nearest Neighbor Search for Robust Classification of Face Image Sets
Wang, Wen
Wang, Ruiping
Shan, Shiguang
Chen, Xilin
[J]. 2015 11TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOPS ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG), VOL. 1, 2015,
[23] Combining k- Nearest Neighbor and Centroid Neighbor Classifier for Fast and Robust Classification
Chmielnicki, Wieslaw
[J]. HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, 2016, 9648 : 536 - 548
[24] Fast Support Vector Machine Classification for Large Data Sets
Li, Xiaoou
Yu, Wen
[J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2014, 7 (02) : 197 - 212
[25] Fast Support Vector Machine Classification for Large Data Sets
Xiaoou Li
Wen Yu
[J]. International Journal of Computational Intelligence Systems, 2014, 7 : 197 - 212
[26] Rates of Convergence for Large-scale Nearest Neighbor Classification
Qiao, Xingye
Duan, Jiexin
Cheng, Guang
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[27] Two dimensional Large Margin Nearest Neighbor for Matrix Classification
Song, Kun
Nie, Feiping
Han, Junwei
[J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2751 - 2757
[28] Distance Metric Learning for Large Margin Nearest Neighbor Classification
Weinberger, Kilian Q.
Saul, Lawrence K.
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2009, 10 : 207 - 244
[29] QSAR/QSPR analysis of large data sets using a fast variable selection approach based on K-nearest neighbor principle.
Xiao, YD
Shen, M
Tropsha, A
[J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2002, 223 : U490 - U490
[30] Distributed nearest neighbor classification for large-scale multi-label data on spark
Gonzalez-Lopez, Jorge
Ventura, Sebastian
Cano, Alberto
[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 87 : 66 - 82

← 1 2 3 4 5 →