Fast nearest neighbor condensation for large data sets classification

被引:125
|
作者
Angiulli, Fabrizio [1 ]
机构
[1] Univ Calabria, Dipartimento Elettron Informat & Sistemat, I-87036 Cosenza, Italy
关键词
classification; large and high-dimensional data; nearest neighbor rule; prototype selection algorithms; training-set-consistent subset;
D O I
10.1109/TKDE.2007.190645
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work has two main objectives, namely, to introduce a novel algorithm, called the Fast Condensed Nearest Neighbor (FCNN) rule, for computing a training-set-consistent subset for the nearest neighbor decision rule and to show that condensation algorithms for the nearest neighbor rule can be applied to huge collections of data. The FCNN rule has some interesting properties: it is order independent, its worst-case time complexity is quadratic but often with a small constant prefactor, and it is likely to select points very close to the decision boundary. Furthermore, its structure allows for the triangle inequality to be effectively exploited to reduce the computational effort. The FCNN rule outperformed even here-enhanced variants of existing competence preservation methods both in terms of learning speed and learning scaling behavior and, often, in terms of the size of the model while it guaranteed the same prediction accuracy. Furthermore, it was three orders of magnitude faster than hybrid instance-based learning algorithms on the MNIST and Massachusetts Institute of Technology (MIT) Face databases and computed a model of accuracy comparable to that of methods incorporating a noise-filtering pass.
引用
收藏
页码:1450 / 1464
页数:15
相关论文
共 50 条
  • [1] Distributed nearest neighbor-based condensation of very large data sets
    Angiulli, Fabrizio
    Folino, Gianluigi
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (12) : 1593 - 1606
  • [2] Efficient distributed data condensation for nearest neighbor classification
    Angiulli, Fabrizio
    Folino, Gianluigi
    [J]. EURO-PAR 2007 PARALLEL PROCESSING, PROCEEDINGS, 2007, 4641 : 338 - +
  • [3] Evolution of reference sets in nearest neighbor classification
    Ishibuchi, H
    Nakashima, T
    [J]. SIMULATED EVOLUTION AND LEARNING, 1999, 1585 : 82 - 89
  • [4] Efficient nearest neighbor classification with data reduction and fast search algorithms
    Sánchez, JS
    Sotoca, JM
    Pla, F
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOLS 1-7, 2004, : 4757 - +
  • [5] Learning of fuzzy reference sets in nearest neighbor classification
    Nakashima, T
    Ishibuchi, H
    [J]. 18TH INTERNATIONAL CONFERENCE OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY - NAFIPS, 1999, : 357 - 360
  • [6] Nearest Neighbor Condensation Based on Fuzzy Rough Set for Classification
    Pan, Wei
    She, Kun
    Wei, Pengyuan
    Zeng, Kai
    [J]. ROUGH SETS AND KNOWLEDGE TECHNOLOGY, RSKT 2014, 2014, 8818 : 432 - 443
  • [7] Fast Bayesian inference of block Nearest Neighbor Gaussian models for large data
    Quiroz, Zaida C.
    Prates, Marcos O.
    Dey, Dipak K.
    Rue, H. avard
    [J]. STATISTICS AND COMPUTING, 2023, 33 (02)
  • [8] Fast Bayesian inference of block Nearest Neighbor Gaussian models for large data
    Zaida C. Quiroz
    Marcos O. Prates
    Dipak K. Dey
    H.åvard Rue
    [J]. Statistics and Computing, 2023, 33
  • [9] Fast Nearest-Neighbor Classification Using RNN in Domains with Large Number of Classes
    Singh, Gautam
    Dasgupta, Gargi
    Deng, Yu
    [J]. SERVICE-ORIENTED COMPUTING, ICSOC 2018, 2019, 11434 : 309 - 321
  • [10] Exploiting computer resources for fast nearest neighbor classification
    Herrero, Jose R.
    Navarro, Juan J.
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2007, 10 (04) : 265 - 275