Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

被引:108
|
作者
Ali, Najat [1 ]
Neagu, Daniel [1 ]
Trundle, Paul [1 ]
机构
[1] Univ Bradford, Fac Engn & Informat, Bradford BD7 1DP, W Yorkshire, England
关键词
k-nearest neighbour; Heterogeneous data set; Combination similarity measures; SIMILARITY MEASURE;
D O I
10.1007/s42452-019-1356-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Distance-based algorithms are widely used for data classification problems. The k-nearest neighbour classification (k-NN) is one of the most popular distance-based algorithms. This classification is based on measuring the distances between the test sample and the training samples to determine the final classification output. The traditional k-NN classifier works naturally with numerical data. The main objective of this paper is to investigate the performance of k-NN on heterogeneous datasets, where data can be described as a mixture of numerical and categorical features. For the sake of simplicity, this work considers only one type of categorical data, which is binary data. In this paper, several similarity measures have been defined based on a combination between well-known distances for both numerical and binary data, and to investigate k-NN performances for classifying such heterogeneous data sets. The experiments used six heterogeneous datasets from different domains and two categories of measures. Experimental results showed that the proposed measures performed better for heterogeneous data than Euclidean distance, and that the challenges raised by the nature of heterogeneous data need personalised similarity measures adapted to the data characteristics.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets
    Najat Ali
    Daniel Neagu
    Paul Trundle
    [J]. SN Applied Sciences, 2019, 1
  • [2] An empirical analysis of the probabilistic K-nearest neighbour classifier
    Manocha, S.
    Girolami, M. A.
    [J]. PATTERN RECOGNITION LETTERS, 2007, 28 (13) : 1818 - 1824
  • [3] Weighted k-nearest leader classifier for large data sets
    Babu, V. Suresh
    Viswanath, P.
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2007, 4815 : 17 - 24
  • [4] An evaluation of k-nearest neighbour imputation using Likert data
    Jönsson, P
    Wohlin, C
    [J]. 10TH INTERNATIONAL SYMPOSIUM ON SOFTWARE METRICS, PROCEEDINGS, 2004, : 108 - 118
  • [5] Handwritten Digit Recognition Using K-Nearest Neighbour Classifier
    Babu, U. Ravi
    Venkateswarlu, Y.
    Chintha, Aneel Kumar
    [J]. 2014 WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT 2014), 2014, : 60 - +
  • [6] Feature extraction for the k-nearest neighbour classifier with genetic programming
    Bot, MCJ
    [J]. GENETIC PROGRAMMING, PROCEEDINGS, 2001, 2038 : 256 - 267
  • [7] Improving performance of the k-nearest neighbor classifier by tolerant rough sets
    Bao, YG
    Du, XY
    Ishii, N
    [J]. PROCEEDINGS OF THE THIRD INTERNATIONAL SYMPOSIUM ON COOPERATIVE DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2000, : 167 - 171
  • [8] An influence-based k-nearest neighbour classifier for classification of data with different densities
    Motallebi, Hassan
    Fakhteh, Amir-Hossein
    [J]. International Journal of Business Intelligence and Data Mining, 2024, 25 (02) : 147 - 167
  • [9] Evaluation of k-Nearest Neighbor classifier performance for direct marketing
    Govindarajan, M.
    Chandrasekaran, R. M.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (01) : 253 - 258
  • [10] A fast exact parallel implementation of the k-nearest neighbour pattern classifier
    Lucas, SM
    [J]. IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, 1998, : 1867 - 1872