Efficient Computation of k-Nearest Neighbour Graphs for Large High-Dimensional Data Sets on GPU Clusters

被引:10
|
作者
Dashti, Ali [1 ]
Komarov, Ivan [1 ]
D'Souza, Roshan M. [1 ]
机构
[1] Univ Wisconsin, Complex Syst Simulat Lab, Dept Mech Engn, Milwaukee, WI 53201 USA
来源
PLOS ONE | 2013年 / 8卷 / 09期
基金
美国国家科学基金会;
关键词
CONSTRUCTION;
D O I
10.1371/journal.pone.0074113
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This paper presents an implementation of the brute-force exact k-Nearest Neighbor Graph (k-NNG) construction for ultra-large high-dimensional data cloud. The proposed method uses Graphics Processing Units (GPUs) and is scalable with multi-levels of parallelism (between nodes of a cluster, between different GPUs on a single node, and within a GPU). The method is applicable to homogeneous computing clusters with a varying number of nodes and GPUs per node. We achieve a 6-fold speedup in data processing as compared with an optimized method running on a cluster of CPUs and bring a hitherto impossible k-NNG generation for a dataset of twenty million images with 15 k dimensionality into the realm of practical possibility.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] A Privacy-Preserving and Efficient k-Nearest Neighbor Query and Classification Scheme Based on k-Dimensional Tree for Outsourced Data
    Du, Jiangyi
    Bian, Fuling
    IEEE ACCESS, 2020, 8 (08) : 69333 - 69345
  • [42] k Nearest Neighbor Similarity Join Algorithm on High-Dimensional Data Using Novel Partitioning Strategy
    Ma, Youzhong
    Hua, Qiaozhi
    Wen, Zheng
    Zhang, Ruiling
    Zhang, Yongxin
    Li, Haipeng
    SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
  • [43] Supporting K nearest neighbors query on high-dimensional data in P2P systems
    Li M.
    Lee W.-C.
    Sivasubramaniam A.
    Zhao J.
    Front. Comput. Sci. China, 2008, 3 (234-247): : 234 - 247
  • [45] Data Quality Measures and Efficient Evaluation Algorithms for Large-Scale High-Dimensional Data
    Cho, Hyeongmin
    Lee, Sangkyun
    APPLIED SCIENCES-BASEL, 2021, 11 (02): : 1 - 17
  • [46] Combining the outputs of various k-nearest neighbor anomaly detectors to form a robust ensemble model for high-dimensional geochemical anomaly detection
    Chen, Yongliang
    Zhao, Qingying
    Lu, Laijun
    JOURNAL OF GEOCHEMICAL EXPLORATION, 2021, 231
  • [47] Curvilinear component analysis: an efficient method for the unfolding and the representation of high-dimensional nonlinear data sets
    Jausions-Picaud, C.
    Herault, J.
    Guerin-Dugue, A.
    Oliva, A.
    PERCEPTION, 1998, 27 : 151 - 151
  • [48] EmbedX: A Versatile, Efficient and Scalable Platform to Embed Both Graphs and High-Dimensional Sparse Data
    Zou, Yuanhang
    Ding, Zhihao
    Shi, Jieming
    Guo, Shuting
    Su, Chunchen
    Zhang, Yafei
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (12): : 3543 - 3556
  • [49] Visualizing large-scale high-dimensional data via hierarchical embedding of KNN graphs
    Zhu, Haiyang
    Zhu, Minfeng
    Feng, Yingchaojie
    Cai, Deng
    Hu, Yuanzhe
    Wu, Shilong
    Wu, Xiangyang
    Chen, Wei
    VISUAL INFORMATICS, 2021, 5 (02) : 51 - 59
  • [50] Combining the outputs of various k-nearest neighbor anomaly detectors to form a robust ensemble model for high-dimensional geochemical anomaly detection
    Chen, Yongliang
    Zhao, Qingying
    Lu, Laijun
    Journal of Geochemical Exploration, 2021, 231