A Bayesian Reassessment of Nearest-Neighbor Classification

被引:26
|
作者
Cucala, Lionel [1 ]
Marin, Jean-Michel [1 ]
Robert, Christian P. [2 ]
Titterington, D. M. [3 ]
机构
[1] Univ Paris 11, Math Lab, INRIA Saclay, F-91400 Orsay, France
[2] Univ Paris 09, CEREMADE, F-75775 Paris, France
[3] Univ Glasgow, Dept Stat, Glasgow G12 8QW, Lanark, Scotland
关键词
Boltzmann model; Compatible conditionals; Normalizing constant; Path sampling; Markov chain Monte Carlo algorithm; Perfect sampling; Pseudo-likelihood; MONTE-CARLO METHOD; PERFECT SIMULATION; LIKELIHOOD; INFERENCE;
D O I
10.1198/jasa.2009.0125
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The k-nearest-neighbor (knn) procedure is a well-known deterministic method used in supervised classification. This article proposes a reassessment of this approach as a statistical technique derived from a proper probabilistic model; in particular, we modify the assessment found in Holmes and Adams, and evaluated by Manocha and Girolami, where the underlying probabilistic model is not completely well defined. Once provided with a clear probabilistic basis for the knn procedure, we derive computational tools for Bayesian inference on the parameters of the corresponding model. In particular, we assess the difficulties inherent to both pseudo-likelihood and path sampling approximations of an intractable normalizing constant. We implement a correct MCMC sampler based on perfect sampling. When perfect sampling is not available, we use instead a Gibbs sampling approximation. Illustrations of the performance of the corresponding Bayesian classifier are provided for benchmark datasets, demonstrating in particular the limitations of the pseudo-likelihood approximation in this set up.
引用
收藏
页码:263 / 273
页数:11
相关论文
共 50 条
  • [1] CHOICE OF NEIGHBOR ORDER IN NEAREST-NEIGHBOR CLASSIFICATION
    Hall, Peter
    Park, Byeong U.
    Samworth, Richard J.
    [J]. ANNALS OF STATISTICS, 2008, 36 (05): : 2135 - 2152
  • [2] Nearest-neighbor classification with categorical variables
    Buttrey, SE
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1998, 28 (02) : 157 - 169
  • [3] Prototype optimization for nearest-neighbor classification
    Huang, YS
    Chiang, CC
    Shieh, JW
    Grimson, E
    [J]. PATTERN RECOGNITION, 2002, 35 (06) : 1237 - 1245
  • [4] Nearest-neighbor classification for facies delineation
    Tartakovsky, Daniel M.
    Wohlberg, Brendt
    Guadagnini, Alberto
    [J]. WATER RESOURCES RESEARCH, 2007, 43 (07)
  • [5] In defense of Nearest-Neighbor based image classification
    Boiman, Oren
    Shechtman, Eli
    Irani, Michal
    [J]. 2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, : 1992 - +
  • [6] Locally adaptive metric nearest-neighbor classification
    Domeniconi, C
    Peng, J
    Gunopulos, D
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (09) : 1281 - 1285
  • [7] Finding Relevant Points for Nearest-Neighbor Classification
    Eppstein, David
    [J]. 2022 SYMPOSIUM ON SIMPLICITY IN ALGORITHMS, SOSA, 2022, : 68 - 78
  • [8] Integrating background knowledge into nearest-neighbor text classification
    Zelikovitz, S
    Hirsh, H
    [J]. ADVANCES IN CASE-BASED REASONING, 2002, 2416 : 1 - 5
  • [9] COMPUTING NEAREST-NEIGHBOR PATTERN-CLASSIFICATION PERCEPTRONS
    MURPHY, O
    BROOKS, B
    KITE, T
    [J]. INFORMATION SCIENCES, 1995, 83 (3-4) : 133 - 142
  • [10] A new nearest-neighbor rule in the pattern classification problem
    Hattori, K
    Takahashi, M
    [J]. PATTERN RECOGNITION, 1999, 32 (03) : 425 - 432