Large-Scale Distributed Locality-Sensitive Hashing for General Metric Data

被引:5
|
作者
Silva, Eliezer [1 ]
Teixeira, Thiago [2 ]
Teodoro, George [2 ]
Valle, Eduardo [1 ]
机构
[1] Univ Estadual Campinas, RECOD Lab, DCA, FEEC, Campinas, Brazil
[2] Univ Brasilia, Dept Comp Sci, Brasilia, DF, Brazil
来源
基金
巴西圣保罗研究基金会;
关键词
D O I
10.1007/978-3-319-11988-5_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Locality-Sensitive Hashing (LSH) is extremely competitive for similarity search, but works under the assumption of uniform access cost to the data, and for just a handful of dissimilarities for which locality-sensitive families are available. In this work we propose Parallel Voronoi LSH, an approach that addresses those two limitations of LSH: it makes LSH efficient for distributed-memory architectures, and it works for very general dissimilarities (in particular, it works for all metric dissimilarities). Each hash table of Voronoi LSH works by selecting a sample of the dataset to be used as seeds of a Voronoi diagram. The Voronoi cells are then used to hash the data. Because Voronoi diagrams depend only on the distance, the technique is very general. Implementing LSH in distributed-memory systems is very challenging because it lacks referential locality in its access to the data: if care is not taken, excessive message-passing ruins the index performance. Therefore, another important contribution of this work is the parallel design needed to allow the scalability of the index, which we evaluate in a dataset of a thousand million multimedia features.
引用
收藏
页码:82 / 93
页数:12
相关论文
共 50 条
  • [1] Efficient large-scale sequence comparison by locality-sensitive hashing
    Buhler, J
    [J]. BIOINFORMATICS, 2001, 17 (05) : 419 - 428
  • [2] Large-Scale Distributed Learning via Private On-Device Locality-Sensitive Hashing
    Rabbani, Tahseen
    Bornstein, Marco
    Huang, Furong
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] A novel locality-sensitive hashing algorithm for similarity searches on large-scale hyperspectral data
    Zhou, Yuan
    Liu, Chun
    Li, Nan
    Li, Minzhen
    [J]. REMOTE SENSING LETTERS, 2016, 7 (10) : 965 - 974
  • [4] Large-Scale Physiological Waveform Retrieval via Locality-Sensitive Hashing
    Kim, Yongwook Bryce
    O'Reilly, Una-May
    [J]. 2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2015, : 5829 - 5833
  • [5] Locality-sensitive hashing for region-based large-scale image indexing
    Gallas, Abir
    Barhoumi, Walid
    Kacem, Neila
    Zagrouba, Ezzeddine
    [J]. IET IMAGE PROCESSING, 2015, 9 (09) : 804 - 810
  • [6] Revisiting Kernelized Locality-Sensitive Hashing for Improved Large-Scale Image Retrieval
    Jiang, Ke
    Que, Qichao
    Kulis, Brian
    [J]. 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 4933 - 4941
  • [7] Non-Metric Locality-Sensitive Hashing
    Mu, Yadong
    Yan, Shuicheng
    [J]. PROCEEDINGS OF THE TWENTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-10), 2010, : 539 - 544
  • [8] Matching User Accounts across Large-scale Social Networks based on Locality-sensitive Hashing
    Li, Yongjun
    Li, Xiangyu
    Yang, Jiaqi
    Gao, Congjie
    [J]. 2020 IEEE INTL SYMP ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, INTL CONF ON BIG DATA & CLOUD COMPUTING, INTL SYMP SOCIAL COMPUTING & NETWORKING, INTL CONF ON SUSTAINABLE COMPUTING & COMMUNICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2020), 2020, : 802 - 809
  • [9] A method using locality-sensitive hashing for large-scale content-based image retrieval
    Wang Weihong
    Wang Song
    [J]. CCDC 2009: 21ST CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, PROCEEDINGS, 2009, : 1816 - 1820
  • [10] Using Locality-Sensitive Hashing for SVM Classification of Large Data Sets
    Gonzalez-Lima, Maria D.
    Ludena, Carenne C.
    [J]. MATHEMATICS, 2022, 10 (11)