A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality

被引:0
|
作者
Wang, Xueyi [1 ]
机构
[1] NW Nazarene Univ, Dept Math & Comp Sci, Nampa, ID 83642 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 10(6) records and 10(4) dimensions, kMkNN shows a 2- to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces.
引用
收藏
页码:1293 / 1299
页数:7
相关论文
共 50 条
  • [31] Unsupervised image clustering algorithm based on contrastive learning and K-nearest neighbors
    Zhang, Xiuling
    Wang, Shuo
    Wu, Ziyun
    Tan, Xiaofei
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (09) : 2415 - 2423
  • [32] Parallel Search of k-Nearest Neighbors with Synchronous Operations
    Sismanis, Nikos
    Pitsianis, Nikos
    Sun, Xiaobai
    [J]. 2012 IEEE CONFERENCE ON HIGH PERFORMANCE EXTREME COMPUTING (HPEC), 2012,
  • [33] Application of improved k-means k-nearest neighbor algorithm in the movie recommendation system
    Cai, Chang
    Wang, Li
    [J]. 2020 13TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2020), 2020, : 314 - 317
  • [34] Unsupervised image clustering algorithm based on contrastive learning and K-nearest neighbors
    Xiuling Zhang
    Shuo Wang
    Ziyun Wu
    Xiaofei Tan
    [J]. International Journal of Machine Learning and Cybernetics, 2022, 13 : 2415 - 2423
  • [35] Density Peaks Clustering Algorithm Based on Representative Points and K-nearest Neighbors
    Zhang Q.-H.
    Zhou J.-P.
    Dai Y.-Y.
    Wang G.-Y.
    [J]. Ruan Jian Xue Bao/Journal of Software, 2023, 34 (12): : 5629 - 5648
  • [36] An improved density peaks clustering algorithm using similarity assignment strategy with K-nearest neighbors
    Hu, Wei
    Feng, Ji
    Yang, Degang
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (09): : 12689 - 12706
  • [37] A novel data clustering algorithm using heuristic rules based on k-nearest neighbors chain
    Lu, Jianyun
    Zhu, Qingsheng
    Wu, Quanwang
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2018, 72 : 213 - 227
  • [38] Efficient k-nearest neighbors search in graph space
    Abu-Aisheh, Zeina
    Raveaux, Romain
    Ramel, Jean-Yves
    [J]. PATTERN RECOGNITION LETTERS, 2020, 134 (134) : 77 - 86
  • [39] K-means Nearest Point Search Algorithm and Heuristic Search for Transportation
    Hlaing, Wai Mar
    Sein, Myint Myint
    [J]. 2018 IEEE 7TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE 2018), 2018, : 779 - 780
  • [40] Performance study of K-nearest neighbor classifier and K-means clustering for predicting the diagnostic accuracy
    Mittal K.
    Aggarwal G.
    Mahajan P.
    [J]. International Journal of Information Technology, 2019, 11 (3) : 535 - 540