On the selection of the globally optimal prototype subset for nearest-neighbor classification

被引:8
|
作者
Carrizosa, Emilio [1 ]
Martin-Barragan, Belen
Plastria, Frank
Morales, Dolores Romero
机构
[1] Univ Seville, Fac Matemat, Seville 41012, Spain
[2] Univ Carlos III Madrid, Dept Estadist, Madrid 28903, Spain
[3] Vrije Univ Brussels, Dept Math Operat Res Stat & Informat Syst Managem, MOSI, B-1050 Brussels, Belgium
[4] Univ Oxford, Said Sch Business, Oxford OX1 1HP, England
关键词
classification; optimal prototype subset; nearest neighbor; dissimilarities; integer programming; variable neighborhood search; missing values;
D O I
10.1287/ijoc.1060.0183
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The nearest-neighbor classifier has been shown to be a powerful tool for multiclass classification. We explore both theoretical properties and empirical behavior of a variant method, in which the nearest-neighbor rule is applied to a reduced set of prototypes. This set is selected a priori by fixing its cardinality and minimizing the empirical misclassification cost. In this way we alleviate the two serious drawbacks of the nearest-neighbor method: high storage requirements and time-consuming queries. Finding this reduced set is shown to be NP-hard. We provide mixed integer programming (MIP) formulations, which are theoretically compared and solved by a standard MIP solver for small problem instances. We show that the classifiers derived from these formulations are comparable to benchmark procedures. We solve large problem instances by a metaheuristic that yields good classification rules in reasonable time. Additional experiments indicate that prototype-based nearest-neighbor classifiers remain quite stable in the presence of missing values.
引用
收藏
页码:470 / 479
页数:10
相关论文
共 50 条
  • [1] Prototype optimization for nearest-neighbor classification
    Huang, YS
    Chiang, CC
    Shieh, JW
    Grimson, E
    PATTERN RECOGNITION, 2002, 35 (06) : 1237 - 1245
  • [2] Optimal selection of reference subset for nearest neighbor classification
    Zhang, Hong-Bin
    Sun, Guang-Yu
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2000, 28 (11): : 16 - 21
  • [3] Optimal reference subset selection for nearest neighbor classification by tabu search
    Zhang, HB
    Sun, GY
    PATTERN RECOGNITION, 2002, 35 (07) : 1481 - 1490
  • [4] CHOICE OF NEIGHBOR ORDER IN NEAREST-NEIGHBOR CLASSIFICATION
    Hall, Peter
    Park, Byeong U.
    Samworth, Richard J.
    ANNALS OF STATISTICS, 2008, 36 (05): : 2135 - 2152
  • [5] Optimal designs for nearest-neighbor analysis
    Chai, FS
    Majumdar, D
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2000, 86 (01) : 265 - 275
  • [6] A Bayesian Reassessment of Nearest-Neighbor Classification
    Cucala, Lionel
    Marin, Jean-Michel
    Robert, Christian P.
    Titterington, D. M.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2009, 104 (485) : 263 - 273
  • [7] Nearest-neighbor classification with categorical variables
    Buttrey, SE
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1998, 28 (02) : 157 - 169
  • [8] Performance evaluation of prototype selection algorithms for nearest neighbor classification
    Sánchez, JS
    Barandela, R
    Alejo, R
    Marqués, AI
    XIV BRAZILIAN SYMPOSIUM ON COMPUTER GRAPHICS AND IMAGE PROCESSING, PROCEEDINGS, 2001, : 44 - 50
  • [9] Decision boundary preserving prototype selection for nearest neighbor classification
    Barandela, R
    Ferri, FJ
    Sánchez, JS
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2005, 19 (06) : 787 - 806
  • [10] Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study
    Garcia, Salvador
    Derrac, Joaquin
    Ramon Cano, Jose
    Herrera, Francisco
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (03) : 417 - 435