Efficient Computation of k-Nearest Neighbour Graphs for Large High-Dimensional Data Sets on GPU Clusters

被引:10
|
作者
Dashti, Ali [1 ]
Komarov, Ivan [1 ]
D'Souza, Roshan M. [1 ]
机构
[1] Univ Wisconsin, Complex Syst Simulat Lab, Dept Mech Engn, Milwaukee, WI 53201 USA
来源
PLOS ONE | 2013年 / 8卷 / 09期
基金
美国国家科学基金会;
关键词
CONSTRUCTION;
D O I
10.1371/journal.pone.0074113
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This paper presents an implementation of the brute-force exact k-Nearest Neighbor Graph (k-NNG) construction for ultra-large high-dimensional data cloud. The proposed method uses Graphics Processing Units (GPUs) and is scalable with multi-levels of parallelism (between nodes of a cluster, between different GPUs on a single node, and within a GPU). The method is applicable to homogeneous computing clusters with a varying number of nodes and GPUs per node. We achieve a 6-fold speedup in data processing as compared with an optimized method running on a cluster of CPUs and bring a hitherto impossible k-NNG generation for a dataset of twenty million images with 15 k dimensionality into the realm of practical possibility.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Distributed computation of the knn graph for large high-dimensional point sets
    Plaku, Erion
    Kavraki, Lydia E.
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2007, 67 (03) : 346 - 359
  • [22] Distance Encoded Product Quantization for Approximate K-Nearest Neighbor Search in High-Dimensional Space
    Heo, Jae-Pil
    Lin, Zhe
    Yoon, Sung-Eui
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (09) : 2084 - 2097
  • [23] Hubness-Aware Shared Neighbor Distances for High-Dimensional k-Nearest Neighbor Classification
    Tomasev, Nenad
    Mladenic, Dunja
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT II, 2012, 7209 : 116 - 127
  • [24] A Novel Unsupervised Feature Selection for High-Dimensional Data Based on FCM and k -Nearest Neighbor Rough Sets
    Xu, Weihua
    Zhang, Yang
    Qian, Yuhua
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [25] GAUSSIAN PROCESSES FOR HIGH-DIMENSIONAL, LARGE DATA SETS: A REVIEW
    Jiang, Mengrui
    Pedrielli, Giulia
    Szu Hui Ng
    2022 WINTER SIMULATION CONFERENCE (WSC), 2022, : 49 - 60
  • [26] An efficient secure k nearest neighbor classification protocol with high-dimensional features
    Sun, Maohua
    Yang, Ruidi
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2020, 35 (11) : 1791 - 1813
  • [27] Comparison of two fast nearest-neighbour search methods in high-dimensional large-sized databases
    Cano, J
    Pérez-Cortés, JC
    Salvador, I
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS, 2004, 3138 : 868 - 875
  • [28] Efficient Data Structures for Density Estimation for Large High-Dimensional Data
    Majdara, Aref
    Nooshabadi, Saeid
    2017 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2017,
  • [29] Very Fast Interactive Visualization of Large Sets of High-dimensional Data
    Dzwinel, Witold
    Wcislo, Rafal
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2015 COMPUTATIONAL SCIENCE AT THE GATES OF NATURE, 2015, 51 : 572 - 581
  • [30] Feature Selection for High Dimensional Data Using Weighted K-Nearest Neighbors and Genetic Algorithm
    Li, Shuangjie
    Zhang, Kaixiang
    Chen, Qianru
    Wang, Shuqin
    Zhang, Shaoqiang
    IEEE ACCESS, 2020, 8 : 139512 - 139528