Fast nearest neighbor condensation for large data sets classification

被引:125
|
作者
Angiulli, Fabrizio [1 ]
机构
[1] Univ Calabria, Dipartimento Elettron Informat & Sistemat, I-87036 Cosenza, Italy
关键词
classification; large and high-dimensional data; nearest neighbor rule; prototype selection algorithms; training-set-consistent subset;
D O I
10.1109/TKDE.2007.190645
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work has two main objectives, namely, to introduce a novel algorithm, called the Fast Condensed Nearest Neighbor (FCNN) rule, for computing a training-set-consistent subset for the nearest neighbor decision rule and to show that condensation algorithms for the nearest neighbor rule can be applied to huge collections of data. The FCNN rule has some interesting properties: it is order independent, its worst-case time complexity is quadratic but often with a small constant prefactor, and it is likely to select points very close to the decision boundary. Furthermore, its structure allows for the triangle inequality to be effectively exploited to reduce the computational effort. The FCNN rule outperformed even here-enhanced variants of existing competence preservation methods both in terms of learning speed and learning scaling behavior and, often, in terms of the size of the model while it guaranteed the same prediction accuracy. Furthermore, it was three orders of magnitude faster than hybrid instance-based learning algorithms on the MNIST and Massachusetts Institute of Technology (MIT) Face databases and computed a model of accuracy comparable to that of methods incorporating a noise-filtering pass.
引用
收藏
页码:1450 / 1464
页数:15
相关论文
共 50 条
  • [21] Data compression and local metrics for nearest neighbor classification
    Ricci, F
    Avesani, P
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1999, 21 (04) : 380 - 384
  • [22] Probabilistic Nearest Neighbor Search for Robust Classification of Face Image Sets
    Wang, Wen
    Wang, Ruiping
    Shan, Shiguang
    Chen, Xilin
    [J]. 2015 11TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOPS ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG), VOL. 1, 2015,
  • [23] Combining k- Nearest Neighbor and Centroid Neighbor Classifier for Fast and Robust Classification
    Chmielnicki, Wieslaw
    [J]. HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, 2016, 9648 : 536 - 548
  • [24] Fast Support Vector Machine Classification for Large Data Sets
    Li, Xiaoou
    Yu, Wen
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2014, 7 (02) : 197 - 212
  • [25] Fast Support Vector Machine Classification for Large Data Sets
    Xiaoou Li
    Wen Yu
    [J]. International Journal of Computational Intelligence Systems, 2014, 7 : 197 - 212
  • [26] Rates of Convergence for Large-scale Nearest Neighbor Classification
    Qiao, Xingye
    Duan, Jiexin
    Cheng, Guang
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [27] Two dimensional Large Margin Nearest Neighbor for Matrix Classification
    Song, Kun
    Nie, Feiping
    Han, Junwei
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2751 - 2757
  • [28] Distance Metric Learning for Large Margin Nearest Neighbor Classification
    Weinberger, Kilian Q.
    Saul, Lawrence K.
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2009, 10 : 207 - 244
  • [29] QSAR/QSPR analysis of large data sets using a fast variable selection approach based on K-nearest neighbor principle.
    Xiao, YD
    Shen, M
    Tropsha, A
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2002, 223 : U490 - U490
  • [30] Distributed nearest neighbor classification for large-scale multi-label data on spark
    Gonzalez-Lopez, Jorge
    Ventura, Sebastian
    Cano, Alberto
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 87 : 66 - 82