Fast nearest neighbor condensation for large data sets classification

被引：125

作者：

Angiulli, Fabrizio ^{[1
]}

机构：

[1] Univ Calabria, Dipartimento Elettron Informat & Sistemat, I-87036 Cosenza, Italy

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2007年 / 19卷 / 11期

关键词：

classification; large and high-dimensional data; nearest neighbor rule; prototype selection algorithms; training-set-consistent subset;

D O I：

10.1109/TKDE.2007.190645

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work has two main objectives, namely, to introduce a novel algorithm, called the Fast Condensed Nearest Neighbor (FCNN) rule, for computing a training-set-consistent subset for the nearest neighbor decision rule and to show that condensation algorithms for the nearest neighbor rule can be applied to huge collections of data. The FCNN rule has some interesting properties: it is order independent, its worst-case time complexity is quadratic but often with a small constant prefactor, and it is likely to select points very close to the decision boundary. Furthermore, its structure allows for the triangle inequality to be effectively exploited to reduce the computational effort. The FCNN rule outperformed even here-enhanced variants of existing competence preservation methods both in terms of learning speed and learning scaling behavior and, often, in terms of the size of the model while it guaranteed the same prediction accuracy. Furthermore, it was three orders of magnitude faster than hybrid instance-based learning algorithms on the MNIST and Massachusetts Institute of Technology (MIT) Face databases and computed a model of accuracy comparable to that of methods incorporating a noise-filtering pass.

引用

页码：1450 / 1464

页数：15

共 50 条

[1] Distributed nearest neighbor-based condensation of very large data sets
Angiulli, Fabrizio
Folino, Gianluigi
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (12) : 1593 - 1606
[2] Efficient distributed data condensation for nearest neighbor classification
Angiulli, Fabrizio
Folino, Gianluigi
[J]. EURO-PAR 2007 PARALLEL PROCESSING, PROCEEDINGS, 2007, 4641 : 338 - +
[3] Evolution of reference sets in nearest neighbor classification
Ishibuchi, H
Nakashima, T
[J]. SIMULATED EVOLUTION AND LEARNING, 1999, 1585 : 82 - 89
[4] Efficient nearest neighbor classification with data reduction and fast search algorithms
Sánchez, JS
Sotoca, JM
Pla, F
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOLS 1-7, 2004, : 4757 - +
[5] Learning of fuzzy reference sets in nearest neighbor classification
Nakashima, T
Ishibuchi, H
[J]. 18TH INTERNATIONAL CONFERENCE OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY - NAFIPS, 1999, : 357 - 360
[6] Nearest Neighbor Condensation Based on Fuzzy Rough Set for Classification
Pan, Wei
She, Kun
Wei, Pengyuan
Zeng, Kai
[J]. ROUGH SETS AND KNOWLEDGE TECHNOLOGY, RSKT 2014, 2014, 8818 : 432 - 443
[7] Fast Bayesian inference of block Nearest Neighbor Gaussian models for large data
Quiroz, Zaida C.
Prates, Marcos O.
Dey, Dipak K.
Rue, H. avard
[J]. STATISTICS AND COMPUTING, 2023, 33 (02)
[8] Fast Bayesian inference of block Nearest Neighbor Gaussian models for large data
Zaida C. Quiroz
Marcos O. Prates
Dipak K. Dey
H.åvard Rue
[J]. Statistics and Computing, 2023, 33
[9] Fast Nearest-Neighbor Classification Using RNN in Domains with Large Number of Classes
Singh, Gautam
Dasgupta, Gargi
Deng, Yu
[J]. SERVICE-ORIENTED COMPUTING, ICSOC 2018, 2019, 11434 : 309 - 321
[10] Exploiting computer resources for fast nearest neighbor classification
Herrero, Jose R.
Navarro, Juan J.
[J]. PATTERN ANALYSIS AND APPLICATIONS, 2007, 10 (04) : 265 - 275

← 1 2 3 4 5 →