Nearest Centroid: A Bridge between Statistics and Machine Learning

被引:3
|
作者
Thulasidas, Manoj [1 ]
机构
[1] Singapore Management Univ, Sch Informat Syst, Singapore, Singapore
关键词
statistical thinking; applied statistics; machine learning; nearest centroid; k-means clustering; k nearest neighbor;
D O I
10.1109/TALE48869.2020.9368396
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
In order to guide our students of machine learning in their statistical thinking, we need conceptually simple and mathematically defensible algorithms. In this paper, we present the Nearest Centroid algorithm (NC) algorithm as a pedagogical tool, combining the key concepts behind two foundational algorithms: K-Means clustering and K Nearest Neighbors (kNN). In NC, we use the centroid (as defined in the K-Means algorithm) of the observations belonging to each class in our training data set and its distance from a new observation (similar to k-NN) for class prediction. Using this obvious extension, we will illustrate how the concepts of probability and statistics are applied in machine learning algorithms. Furthermore, we will describe how the practical aspects of validation and performance measurements are carried out. The algorithm and the work presented here can be easily converted to labs and reading assignments to cement the students' understanding of applied statistics and its connection to machine learning algorithms, as described toward the end of this paper.
引用
收藏
页码:9 / 16
页数:8
相关论文
共 50 条
  • [1] Soft learning: A conceptual bridge between data mining and machine learning
    Marginean, FA
    [J]. APPLICATIONS AND SCIENCE IN SOFT COMPUTING, 2004, : 241 - 248
  • [2] Machine Failure Analysis Using Nearest Centroid Classification for Industrial Internet of Things
    Kwon, Jung-Hyok
    Kim, Eui-Jik
    [J]. SENSORS AND MATERIALS, 2019, 31 (05) : 1751 - 1757
  • [3] A novel ordinal learning strategy: Ordinal nearest-centroid projection
    Tian, Qing
    Chen, Songcan
    [J]. KNOWLEDGE-BASED SYSTEMS, 2015, 88 : 144 - 153
  • [4] Weighted Nearest Centroid Neighbourhood
    Acena, Victor
    Moguerza, Javier M.
    de Diego, Isaac Martin
    Fernandez, Ruben R.
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2019, PT I, 2019, 11871 : 94 - 101
  • [5] Algebraic statistics: a new bridge between statistics and mathematics
    Fienberg, Stephen E.
    [J]. STATISTICA SINICA, 2007, 17 (04) : 1259 - 1260
  • [6] Statistics versus machine learning
    Danilo Bzdok
    Naomi Altman
    Martin Krzywinski
    [J]. Nature Methods, 2018, 15 : 233 - 234
  • [7] A pseudo nearest centroid neighbour classifier
    Ma, Hongxing
    Gou, Jianping
    Wang, Xili
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2018, 17 (01) : 55 - 68
  • [8] Influence of thresholding on centroid statistics
    Ares, J
    Arines, J
    [J]. OPTICS FOR THE QUALITY OF LIFE, PTS 1 AND 2, 2003, 4829 : 180 - 181
  • [9] Centroid dynamics with quantum statistics
    Roy, PN
    Blinov, N
    [J]. ISRAEL JOURNAL OF CHEMISTRY, 2002, 42 (2-3) : 183 - 190
  • [10] A Centroid k-Nearest Neighbor Method
    Zhang, Qingjiu
    Sun, Shiliang
    [J]. ADVANCED DATA MINING AND APPLICATIONS, ADMA 2010, PT I, 2010, 6440 : 278 - 285