ARKGraph: All-Range Approximate K-Nearest-Neighbor Graph

被引:2
|
作者
Zuo, Chaoji [1 ]
Deng, Dong [1 ]
机构
[1] Rutgers State Univ, New Brunswick, NJ 08901 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2023年 / 16卷 / 10期
基金
美国国家科学基金会;
关键词
PRODUCT QUANTIZATION; SMALL WORLD; SEARCH; LOCALITY;
D O I
10.14778/3603581.3603601
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Given a collection of vectors, the approximate K-nearest-neighbor graph (KGraph for short) connects every vector to its approximate K-nearest-neighbors (KNN for short). KGraph plays an important role in high dimensional data visualization, semantic search, manifold learning, and machine learning. The vectors are typically vector representations of real-world objects (e.g., images and documents), which often come with a few structured attributes, such as timestamps and locations. In this paper, we study the all-range approximate K-nearest-neighbor graph (ARKGraph) problem. Specifically, given a collection of vectors, each associated with a numerical search key (e.g., a timestamp), we aim to build an index that takes a search key range as the query and returns the KGraph of vectors whose search keys are within the query range. ARKGraph can facilitate interactive high dimensional data visualization, data mining, etc. A key challenge of this problem is the huge index size. This is because, given.. vectors, a brute-force index stores a KGraph for every search key range, which results in O(Kn(3)) index size as there are O(n(2)) search key ranges and each KGraph takes O(Kn) space. We observe that the KNN of a vector in nearby ranges are often the same, which can be grouped together to save space. Based on this observation, we propose a series of novel techniques that reduce the index size significantly to just O(Kn logn) in the average case. Furthermore, we develop an efficient indexing algorithm that constructs the optimized ARKGraph index directly without exhaustively calculating the distance between every pair of vectors. To process a query, for each vector in the query range, we only need O(log log n + K log K) to restore its KNN in the query range from the optimized ARKGraph index. We conducted extensive experiments on real-world datasets. Experimental results show that our optimized ARKGraph index achieved a small index size, low query latency, and good scalability. Specifically, our approach was 1000x faster than the baseline method that builds a KGraph for all the vectors in the query range on-the-fly.
引用
收藏
页码:2645 / 2658
页数:14
相关论文
共 50 条
  • [41] Image classification based on quantum K-Nearest-Neighbor algorithm
    Yijie Dang
    Nan Jiang
    Hao Hu
    Zhuoxiao Ji
    Wenyin Zhang
    Quantum Information Processing, 2018, 17
  • [42] DEVELOPMENT OF A NOVEL WEIGHTING SCHEME FOR THE K-NEAREST-NEIGHBOR ALGORITHM
    FORBES, RA
    TEWS, EC
    FREISER, BS
    WISE, MB
    PERONE, SP
    JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1986, 26 (03): : 93 - 98
  • [43] Fast Algorithm for Approximate k-Nearest Neighbor Graph Construction3
    Wang, Dilin
    Shi, Lei
    Cao, Jianwen
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2013, : 349 - 356
  • [44] An agglomerative clustering algorithm using a dynamic k-nearest-neighbor list
    Lai, Jim Z. C.
    Huang, Tsung-Jen
    INFORMATION SCIENCES, 2011, 181 (09) : 1722 - 1734
  • [45] Distributed processing of moving K-nearest-neighbor query on moving objects
    Wu, Wei
    Guo, Wenyuan
    Tan, Kian-Lee
    2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2007, : 1091 - +
  • [46] An Improved Algorithm for k-Nearest-Neighbor Finding and Surface Normals Estimation
    赵灿
    孟祥林
    TsinghuaScienceandTechnology, 2009, 14(S1) (S1) : 77 - 81
  • [47] Optimal construction of k-nearest-neighbor graphs for identifying noisy clusters
    Maier, Markus
    Hein, Matthias
    von Luxburg, Ulrike
    THEORETICAL COMPUTER SCIENCE, 2009, 410 (19) : 1749 - 1764
  • [48] Divergence Estimation for Multidimensional Densities Via k-Nearest-Neighbor Distances
    Wang, Qing
    Kulkarni, Sanjeev R.
    Verdu, Sergio
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2009, 55 (05) : 2392 - 2405
  • [49] K-Nearest-Neighbor Local Sampling Based Conditional Independence Testing
    Li, Shuai
    Zhang, Yingjie
    Zhu, Hongtu
    Wang, Christina Dan
    Shu, Hai
    Chen, Ziqi
    Sun, Zhuoran
    Yang, Yanfeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [50] GRkNN: Group reverse k-nearest-neighbor query in spatial databases
    Song X.-Y.
    Yu C.-C.
    Sun H.-L.
    Xu J.-K.
    Jisuanji Xuebao/Chinese Journal of Computers, 2010, 33 (12): : 2229 - 2238