A Distributed Near-Optimal LSH-based Framework for Privacy-Preserving Record Linkage

被引:10
|
作者
Karapiperis, Dimitrios [1 ]
Verykios, Vassilios S. [1 ]
机构
[1] Hellen Open Univ, Sch Sci & Technol, Patras, Greece
关键词
Locality-Sensitive Hashing; Bloom filter; Map/Reduce;
D O I
10.2298/CSIS140215040K
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a framework which relies on the Map/Redtice paradigm in order to distribute computations among underutilized commodity hardware resources uniformly, without imposing an extra overhead on the existing infrastructure. The volume of the distance computations, required for records comparison, is largely reduced by utilizing the so-called Locality-Sensitive Hashing technique, which is optimally tuned in order to avoid highly redundant computations. Experimental results illustrate the effectiveness of our distributed framework in finding the matched record pairs in voluminous data sets.
引用
收藏
页码:745 / 763
页数:19
相关论文
共 50 条