SES-LSH: Shuffle-Efficient Locality Sensitive Hashing for Distributed Similarity Search

被引:12
|
作者
Li, Dongsheng [1 ]
Zhang, Wanxin [1 ]
Shen, Siqi [1 ]
Zhang, Yiming [1 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Natl Lab Parallel & Distributed Proc, Changsha, Hunan, Peoples R China
基金
美国国家科学基金会;
关键词
Locality Sensitive Hashing; shuffle; location-aware querying; Similarity Search;
D O I
10.1109/ICWS.2017.99
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Locality Sensitive Hashing ( LSH) is a widely used similarity search technique for many web services, such as content-based retrieval services for images and videos. Due to its popularity, much research effort has been devoted to improving the search quality, and the indexing and query performance of LSH. However, most existing variants of LSH can only run on single node, which limits their applicability to large-scale data. In this paper, we present a Shuffle-Efficient Similarity Search scheme based on LSH, which can be efficiently executed in distributed environments, to serve a massive amount of data. In SES-LSH, a shuffle efficient indexing scheme is proposed to reduce the data shuffle when constructing hash tables, and a location-aware querying scheme is proposed to improve the query performance. We have implemented a prototype of SES-LSH based on Spark, and several optimizations have been utilized to improve the fine-grained hash table operations of distributed LSH. Extensive experiments using large-scale real-world datasets show that SES-LSH is remarkably more efficient than existing methods.
引用
收藏
页码:822 / 827
页数:6
相关论文
共 50 条
  • [21] Streaming Similarity Search over one Billion Tweets using Parallel Locality-Sensitive Hashing
    Sundaram, Narayanan
    Turmukhametova, Aizana
    Satish, Nadathur
    Mostak, Todd
    Indyk, Piotr
    Madden, Samuel
    Dubey, Pradeep
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (14): : 1930 - 1941
  • [22] Locality Sensitive Hashing for Efficient Similar Polygon Retrieval
    Kaplan, Haim
    Tenenbaum, Jay
    [J]. 38TH INTERNATIONAL SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE (STACS 2021), 2021, 187
  • [23] An Efficient Recommender System Using Locality Sensitive Hashing
    Zhang, Kunpeng
    Fan, Shaokun
    Wang, Harry Jiannan
    [J]. PROCEEDINGS OF THE 51ST ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2018, : 780 - 789
  • [24] DB-LSH: Locality-Sensitive Hashing with Query-based Dynamic Bucketing
    Tian, Yao
    Zhao, Xi
    Thou, Xiaofang
    [J]. 2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2250 - 2262
  • [25] Evolving Computationally Efficient Hashing for Similarity Search
    Iclanzan, David
    Szilagyi, Sandor Miklos
    Szilagyi, Laszlo
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2018), PT II, 2018, 11302 : 552 - 563
  • [26] EFFICIENT SPEAKER SEARCH OVER LARGE POPULATIONS USING KERNELIZED LOCALITY-SENSITIVE HASHING
    Jeon, Woojay
    Cheng, Yan-Ming
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4261 - 4264
  • [27] Locality-Sensitive Hashing of Soft Biometrics for Efficient Face Image Database Search and Retrieval
    Alshahrani, Ameerah Abdullah
    Jaha, Emad Sami
    [J]. ELECTRONICS, 2023, 12 (06)
  • [28] Dynamic Partition Forest: An Efficient and Distributed Indexing Scheme for Similarity Search based on Hashing
    Lu, Yangdi
    Bo, Yang
    He, Wenbo
    Nabatchian, Amir
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 1059 - 1064
  • [29] Kernelized Locality-Sensitive Hashing for Scalable Image Search
    Kulis, Brian
    Grauman, Kristen
    [J]. 2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, : 2130 - 2137
  • [30] Batch-Orthogonal Locality-Sensitive Hashing for Angular Similarity
    Ji, Jianqiu
    Yan, Shuicheng
    Li, Jianmin
    Gao, Guangyu
    Tian, Qi
    Zhang, Bo
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (10) : 1963 - 1974