Cryptographically Secure Private Record Linkage Using Locality-Sensitive Hashing

被引:0
|
作者
Wei, Ruidi [1 ]
Kerschbaum, Florian [1 ]
机构
[1] Univ Waterloo, Waterloo, ON, Canada
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2023年 / 17卷 / 02期
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.14778/3626292.3626293
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Private record linkage (PRL) is the problem of identifying pairs of records that approximately match across datasets in a secure, privacy-preserving manner. Two-party PRL specifically allows each of the parties to obtain records from the other party, only given that each record matches with one of their own. The privacy goal is that no other information about the datasets should be released than the matching records. A fundamental challenge is not to leak information while at the same time not comparing all pairs of records. In plaintext record linkage this is done using a blocking strategy, e.g., locality-sensitive hashing. One recent approach proposed by He et al. (ACM CCS 2017) uses locality-sensitive hashing and then releases a provably differential private representation of the hash bins. However, differential privacy still leaks some, although provable bounded information and does not protect against attacks, such as property inference attacks. Another recent approach by Khurram and Kerschbaum (IEEE ICDE 2020) uses locality-preserving hashing and provides cryptographic security, i.e., it releases no information except the output. However, locality-preserving hash functions are much harder to construct than locality-sensitive hash functions and hence accuracy of this approach is limited, particularly on larger datasets. In this paper, we address the open problem of providing cryptographic security of PRL while using locality-sensitive hash functions. Using recent results in oblivious algorithms, we design a new cryptographically secure PRL with locality-sensitive hash functions. Our prototypical implementation can match 40000 records in the British National Library/Toronto Public Library and the North Carolina Voter Registry datasets with 99.3% and 99.9% accuracy, respectively, in less than an hour which is more than an order of magnitude faster than Khurram and Kerschbaum's work at a higher accuracy.
引用
收藏
页码:79 / 91
页数:13
相关论文
共 50 条
  • [1] In Defense of Locality-Sensitive Hashing
    Ding, Kun
    Huo, Chunlei
    Fan, Bin
    Xiang, Shiming
    Pan, Chunhong
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (01) : 87 - 103
  • [2] Secure Approximate Nearest Neighbor Search with Locality-Sensitive Hashing
    Song, Shang
    Liu, Lin
    Chen, Rongmao
    Peng, Wei
    Wang, Yi
    [J]. COMPUTER SECURITY - ESORICS 2023, PT III, 2024, 14346 : 411 - 430
  • [3] Using Locality-sensitive Hashing for Rendezvous Search
    Jiang, Guann-Yng
    Chang, Cheng-Shang
    [J]. ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 1743 - 1749
  • [4] Kernelized Locality-Sensitive Hashing
    Kulis, Brian
    Grauman, Kristen
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (06) : 1092 - 1104
  • [5] Correlated Locality-Sensitive Hashing
    Pagh, Rasmus
    [J]. ALGORITHMS - ESA 2015, 2015, 9294
  • [6] Bit Reduction for Locality-Sensitive Hashing
    Liu, Huawen
    Zhou, Wenhua
    Zhang, Hong
    Li, Gang
    Zhang, Shichao
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (09) : 12470 - 12481
  • [7] An Improved Algorithm for Locality-Sensitive Hashing
    Cen, Wei
    Miao, Kehua
    [J]. 10TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE 2015), 2015, : 61 - 64
  • [8] Fast Redescription Mining Using Locality-Sensitive Hashing
    Karjalainen, Maiju
    Galbrun, Esther
    Miettinen, Pauli
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT VII, ECML PKDD 2024, 2024, 14947 : 124 - 142
  • [9] Locality-sensitive hashing for the edit distance
    Marcais, Guillaume
    DeBlasio, Dan
    Pandey, Prashant
    Kingsford, Carl
    [J]. BIOINFORMATICS, 2019, 35 (14) : I127 - I135
  • [10] Optimal Parameters for Locality-Sensitive Hashing
    Slaney, Malcolm
    Lifshits, Yury
    He, Junfeng
    [J]. PROCEEDINGS OF THE IEEE, 2012, 100 (09) : 2604 - 2623