Compressed kNN: K-Nearest Neighbors with Data Compression

被引:34
|
作者
Salvador-Meneses, Jaime [1 ]
Ruiz-Chavez, Zoila [1 ]
Garcia-Rodriguez, Jose [2 ]
机构
[1] Univ Cent Ecuador, Fac Ingn Ciencias Fis & Matemat, Quito 170129, Ecuador
[2] Univ Alicante, Comp Technol Dept, E-03080 Alicante, Spain
关键词
classification; KNN; compression; categorical data; feature pre-processing;
D O I
10.3390/e21030234
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
The kNN (k-nearest neighbors) classification algorithm is one of the most widely used non-parametric classification methods, however it is limited due to memory consumption related to the size of the dataset, which makes them impractical to apply to large volumes of data. Variations of this method have been proposed, such as condensed KNN which divides the training dataset into clusters to be classified, other variations reduce the input dataset in order to apply the algorithm. This paper presents a variation of the kNN algorithm, of the type structure less NN, to work with categorical data. Categorical data, due to their nature, can be compressed in order to decrease the memory requirements at the time of executing the classification. The method proposes a previous phase of compression of the data to then apply the algorithm on the compressed data. This allows us to maintain the whole dataset in memory which leads to a considerable reduction of the amount of memory required. Experiments and tests carried out on known datasets show the reduction in the volume of information stored in memory and maintain the accuracy of the classification. They also show a slight decrease in processing time because the information is decompressed in real time (on-the-fly) while the algorithm is running.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Compressed k-Nearest Neighbors Ensembles for Evolving Data Streams
    Bahri, Maroua
    Bifet, Albert
    Maniu, Silviu
    de Mello, Rodrigo F.
    Tziortziotis, Nikolaos
    [J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 961 - 968
  • [2] NS-kNN: a modified k-nearest neighbors approach for imputing metabolomics data
    Lee, Justin Y.
    Styczynski, Mark P.
    [J]. METABOLOMICS, 2018, 14 (12)
  • [3] NS-kNN: a modified k-nearest neighbors approach for imputing metabolomics data
    Justin Y. Lee
    Mark P. Styczynski
    [J]. Metabolomics, 2018, 14
  • [4] K-Nearest Neighbors Hashing
    He, Xiangyu
    Wang, Peisong
    Cheng, Jian
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2834 - 2843
  • [5] Modernizing k-nearest neighbors
    Elizabeth Yancey, Robin
    Xin, Bochao
    Matloff, Norm
    [J]. STAT, 2021, 10 (01):
  • [6] Locating Renewable Energy Generators Using K-Nearest Neighbors (KNN) Algorithm
    Asadi, Meysam
    Pourhossein, Kazem
    [J]. 2019 IRANIAN CONFERENCE ON RENEWABLE ENERGY & DISTRIBUTED GENERATION (ICREDG), 2019,
  • [7] Consistency of the k-nearest neighbors rule for functional data
    Younso, Ahmad
    [J]. COMPTES RENDUS MATHEMATIQUE, 2023, 361 (01) : 237 - 242
  • [8] kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data
    Maillo, Jesus
    Ramirez, Sergio
    Triguero, Isaac
    Herrera, Francisco
    [J]. KNOWLEDGE-BASED SYSTEMS, 2017, 117 : 3 - 15
  • [9] A new k-nearest neighbors classifier for functional data
    Zhu, Tianming
    Zhang, Jin-ting
    [J]. STATISTICS AND ITS INTERFACE, 2022, 15 (02) : 247 - 260
  • [10] k-nearest neighbors prediction and classification for spatial data
    Mohamed-Salem Ahmed
    Mamadou N’diaye
    Mohammed Kadi Attouch
    Sophie Dabo-Niange
    [J]. Journal of Spatial Econometrics, 2023, 4 (1):