Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

被引:108
|
作者
Ali, Najat [1 ]
Neagu, Daniel [1 ]
Trundle, Paul [1 ]
机构
[1] Univ Bradford, Fac Engn & Informat, Bradford BD7 1DP, W Yorkshire, England
关键词
k-nearest neighbour; Heterogeneous data set; Combination similarity measures; SIMILARITY MEASURE;
D O I
10.1007/s42452-019-1356-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Distance-based algorithms are widely used for data classification problems. The k-nearest neighbour classification (k-NN) is one of the most popular distance-based algorithms. This classification is based on measuring the distances between the test sample and the training samples to determine the final classification output. The traditional k-NN classifier works naturally with numerical data. The main objective of this paper is to investigate the performance of k-NN on heterogeneous datasets, where data can be described as a mixture of numerical and categorical features. For the sake of simplicity, this work considers only one type of categorical data, which is binary data. In this paper, several similarity measures have been defined based on a combination between well-known distances for both numerical and binary data, and to investigate k-NN performances for classifying such heterogeneous data sets. The experiments used six heterogeneous datasets from different domains and two categories of measures. Experimental results showed that the proposed measures performed better for heterogeneous data than Euclidean distance, and that the challenges raised by the nature of heterogeneous data need personalised similarity measures adapted to the data characteristics.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] K-nearest neighbour-based feature selection using hyperspectral data
    Pal, Mahesh
    Charan, Teja B.
    Poriya, Akshay
    [J]. REMOTE SENSING LETTERS, 2021, 12 (02) : 128 - 137
  • [42] Dynamic Data Discretization Technique based on Frequency and K-Nearest Neighbour algorithm
    Ahmed, Almahdi Mohammed
    Abu Bakar, Azuraliza
    Hamdan, Abdul Razak
    [J]. 2009 2ND CONFERENCE ON DATA MINING AND OPTIMIZATION, 2009, : 10 - 14
  • [43] k-Nearest Neighbour method in functional nonparametric regression
    Burba, Florent
    Ferraty, Frederic
    Vieu, Philippe
    [J]. JOURNAL OF NONPARAMETRIC STATISTICS, 2009, 21 (04) : 453 - 469
  • [44] Continuous k-Nearest Neighbour Strategies Using the mqrtree
    Osborn, Wendy
    [J]. ADVANCES IN NETWORK-BASED INFORMATION SYSTEMS, NBIS-2018, 2019, 22 : 168 - 181
  • [45] Outlier detection using k-nearest neighbour graph
    Hautamäki, V
    Kärkkäinen, I
    Fränti, P
    [J]. PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, 2004, : 430 - 433
  • [46] Modified K-nearest neighbour filters for simple implementation
    Gevorkian, D
    Egiazarian, K
    Astola, J
    [J]. ISCAS 2000: IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - PROCEEDINGS, VOL IV: EMERGING TECHNOLOGIES FOR THE 21ST CENTURY, 2000, : 565 - 568
  • [47] Central limit theorems for k-nearest neighbour distances
    Penrose, MD
    [J]. STOCHASTIC PROCESSES AND THEIR APPLICATIONS, 2000, 85 (02) : 295 - 320
  • [48] Data-driven state monitoring of air preheater using density peaks clustering and evidential K-nearest neighbour classifier
    Sha, Peng
    Wu, Xiao
    Shen, Jiong
    Liu, Xichui
    Wang, Meihong
    [J]. 2018 2ND INTERNATIONAL CONFERENCE ON FUNCTIONAL MATERIALS AND CHEMICAL ENGINEERING (ICFMCE 2018), 2019, 272
  • [49] K-Nearest Neighbors Classifier for Field Bit Error Rate Data
    Allogba, Stephanie
    Tremblay, Christine
    [J]. 2018 ASIA COMMUNICATIONS AND PHOTONICS CONFERENCE (ACP), 2018,
  • [50] Segmentation of retinal blood vessels using scale-space features and K-nearest neighbour classifier
    Salem, Nancy M.
    Nandi, Asoke K.
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 2249 - 2252