Parallelizing the Bounded K-Nearest Neighbors Algorithm for Distributed Computing Systems

被引:0
|
作者
Japa, Arialdis [1 ]
Shi, Yong [1 ]
机构
[1] Kennesaw State Univ, Coll Comp & Software Engn, Marietta, GA 30144 USA
关键词
KNN; Bounded KNN; Vector Space Model; Apache Spark;
D O I
10.1109/ccwc47524.2020.9031198
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The need for data collection and analysis in recent years has given rise to the field of Big Data. Organizations place high importance in this area because it could potentially lead to an improved quality of life for consumers and large amounts of profits. The K Nearest Neighbors (KNN) algorithm has been used to extract meaningful information from datasets. However, its performance suffers when it's applied to large datasets due to a bottleneck issue. We previously proposed a Bounded KNN algorithm, which alleviates this bottleneck and improves performance without sacrificing prediction accuracy. Although it is more efficient than the traditional KNN algorithm, the Bounded KNN is still not ideal for handling the massive amounts of data which are prevalent in today's world. In this paper, we present a parallelized algorithm which further improves performance by distributing the workload across a cluster of machines. Our experimental results show that there is some overhead time involved with distributing the workload, but when the datasets are increasingly larger, the benefits of parallelization eventually outweigh this limitation.
引用
收藏
页码:38 / 45
页数:8
相关论文
共 50 条
  • [1] BRANCH AND BOUND ALGORITHM FOR COMPUTING K-NEAREST NEIGHBORS
    FUKUNAGA, K
    NARENDRA, PM
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1975, C 24 (07) : 750 - 753
  • [2] Distributed architecture for k-nearest neighbors recommender systems
    Formoso, Vreixo
    Fernandez, Diego
    Cacheda, Fidel
    Carneiro, Victor
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2015, 18 (04): : 997 - 1017
  • [3] Distributed architecture for k-nearest neighbors recommender systems
    Vreixo Formoso
    Diego Fernández
    Fidel Cacheda
    Victor Carneiro
    [J]. World Wide Web, 2015, 18 : 997 - 1017
  • [4] AN IMPROVED BRANCH AND BOUND ALGORITHM FOR COMPUTING K-NEAREST NEIGHBORS
    KAMGARPARSI, B
    KANAL, LN
    [J]. PATTERN RECOGNITION LETTERS, 1985, 3 (01) : 7 - 12
  • [5] K-nearest neighbors clustering algorithm
    Gauza, Dariusz
    Zukowska, Anna
    Nowak, Robert
    [J]. PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS 2014, 2014, 9290
  • [6] Heuristics for Computing k-Nearest Neighbors Graphs
    Chavez, Edgar
    Luduena, Veronica
    Reyes, Nora
    [J]. COMPUTER SCIENCE - CACIC 2019, 2020, 1184 : 234 - 249
  • [7] A NEW FUZZY K-NEAREST NEIGHBORS ALGORITHM
    Li, Chengjie
    Pei, Zheng
    Li, Bo
    Zhang, Zhen
    [J]. INTELLIGENT DECISION MAKING SYSTEMS, VOL. 2, 2010, : 246 - +
  • [8] POSTER: Scalable K-Nearest Neighbors Implementation using Distributed Embedded Systems
    De Sio, Corrado
    Avignone, Andrea
    Sterpone, Luca
    Chiusano, Silvia
    [J]. PROCEEDINGS OF THE 21ST ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2024, CF 2024, 2024, : 314 - 315
  • [9] EDITING FOR THE K-NEAREST NEIGHBORS RULE BY A GENETIC ALGORITHM
    KUNCHEVA, LI
    [J]. PATTERN RECOGNITION LETTERS, 1995, 16 (08) : 809 - 814
  • [10] PERFORMANCE OF K-NEAREST NEIGHBORS ALGORITHM IN OPINION CLASSIFICATION
    Jedrzejewski, Krzysztof
    Zamorski, Maurycy
    [J]. FOUNDATIONS OF COMPUTING AND DECISION SCIENCES, 2013, 38 (02) : 97 - 110