Efficient locality-sensitive hashing over high-dimensional streaming data

被引:0
|
作者
Hao Wang
Chengcheng Yang
Xiangliang Zhang
Xin Gao
机构
[1] King Abdullah University of Science and Technology,Computational Bioscience Research Center, CEMSE Division
[2] King Abdullah University of Science and Technology,Machine Intelligence and kNowledge Engineering Laboratory, CEMSE Division
[3] Shenzhen University,undefined
来源
关键词
Approximate nearest neighbor search; Locality-sensitive hashing; LSM-tree; Streaming data;
D O I
暂无
中图分类号
学科分类号
摘要
Approximate nearest neighbor (ANN) search in high-dimensional spaces is fundamental in many applications. Locality-sensitive hashing (LSH) is a well-known methodology to solve the ANN problem. Existing LSH-based ANN solutions typically employ a large number of individual indexes optimized for searching efficiency. Updating such indexes might be impractical when processing high-dimensional streaming data. In this paper, we present a novel disk-based LSH index that offers efficient support for both searches and updates. The contributions of our work are threefold. First, we use the write-friendly LSM-trees to store the LSH projections to facilitate efficient updates. Second, we develop a novel estimation scheme to estimate the number of required LSH functions, with which the disk storage and access costs are effectively reduced. Third, we exploit both the collision number and the projection distance to improve the efficiency of candidate selection, improving the search performance with theoretical guarantees on the result quality. Experiments on four real-world datasets show that our proposal outperforms the state-of-the-art schemes.
引用
收藏
页码:3753 / 3766
页数:13
相关论文
共 50 条
  • [1] Efficient locality-sensitive hashing over high-dimensional streaming data
    Wang, Hao
    Yang, Chengcheng
    Zhang, Xiangliang
    Gao, Xin
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (05): : 3753 - 3766
  • [2] Efficient Locality-Sensitive Hashing Over High-Dimensional Data Streams
    Yang, Chengcheng
    Deng, Dong
    Shang, Shuo
    Shao, Ling
    [J]. 2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 1994 - 1997
  • [3] Reverse Query-Aware Locality-Sensitive Hashing for High-Dimensional Furthest Neighbor Search
    Huang, Qiang
    Feng, Jianlin
    Fang, Qiong
    [J]. 2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 167 - 170
  • [4] Double locality sensitive hashing Bloom filter for high-dimensional streaming anomaly detection
    Zeng, Zhixia
    Xiao, Ruliang
    Lin, Xinhong
    Luo, Tianjian
    Lin, Jiayin
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
  • [5] In Defense of Locality-Sensitive Hashing
    Ding, Kun
    Huo, Chunlei
    Fan, Bin
    Xiang, Shiming
    Pan, Chunhong
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (01) : 87 - 103
  • [6] Kernelized Locality-Sensitive Hashing
    Kulis, Brian
    Grauman, Kristen
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (06) : 1092 - 1104
  • [7] Correlated Locality-Sensitive Hashing
    Pagh, Rasmus
    [J]. ALGORITHMS - ESA 2015, 2015, 9294
  • [8] Efficient Data Stream Clustering with Sliding Windows based on Locality-Sensitive Hashing
    Youn, Jonghem
    Shim, Junho
    Lee, Sang-Goo
    [J]. IEEE ACCESS, 2018, 6 : 63757 - 63776
  • [9] Streaming Similarity Search over one Billion Tweets using Parallel Locality-Sensitive Hashing
    Sundaram, Narayanan
    Turmukhametova, Aizana
    Satish, Nadathur
    Mostak, Todd
    Indyk, Piotr
    Madden, Samuel
    Dubey, Pradeep
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (14): : 1930 - 1941
  • [10] EFFICIENT SPEAKER SEARCH OVER LARGE POPULATIONS USING KERNELIZED LOCALITY-SENSITIVE HASHING
    Jeon, Woojay
    Cheng, Yan-Ming
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4261 - 4264