Locality Sensitive Hashing with Temporal and Spatial Constraints for Efficient Population Record Linkage

被引:1
|
作者
Nanayakkara, Charini [1 ,2 ]
Christen, Peter [1 ,2 ]
机构
[1] Australian Natl Univ, Canberra, ACT, Australia
[2] Univ Edinburgh, Scottish Ctr Adm Data Res SCADR, Edinburgh, Midlothian, Scotland
基金
英国经济与社会研究理事会;
关键词
Scalability; personal data; spatial constraint; temporal constraint;
D O I
10.1145/3511808.3557631
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Record linkage is the process of identifying which records within or across databases refer to the same entity. Min-hash based Locality Sensitive Hashing (LSH) is commonly used in record linkage as a blocking technique to reduce the number of records to be compared. However, when applied on large databases, min-hash LSH can yield highly skewed block size distributions and many redundant record pair comparisons, where only few of those correspond to true matches (records that refer to the same entity). Furthermore, min-hash LSH is highly parameter sensitive and requires trial and error to determine the optimal trade-off between blocking quality and efficiency of the record pair comparison step. In this paper, we present a novel method to improve the scalability and robustness of min-hash LSH for linking large population databases by exploiting temporal and spatial information available in personal data, and by filtering record pairs based on block sizes and min-hash similarity. Our evaluation on three real-world data sets shows that our method can improve the efficiency of record pair comparison by 75% to 99%, whereas the final average linkage precision can be improved by 28% at the cost of a reduction in the average recall by 4%.
引用
收藏
页码:4354 / 4358
页数:5
相关论文
共 41 条
  • [22] Efficient locality-sensitive hashing over high-dimensional streaming data
    Hao Wang
    Chengcheng Yang
    Xiangliang Zhang
    Xin Gao
    [J]. Neural Computing and Applications, 2023, 35 : 3753 - 3766
  • [23] EFFICIENT SPEAKER SEARCH OVER LARGE POPULATIONS USING KERNELIZED LOCALITY-SENSITIVE HASHING
    Jeon, Woojay
    Cheng, Yan-Ming
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4261 - 4264
  • [24] Efficient query-by-content audio retrieval by Locality Sensitive Hashing and partial sequence comparison
    Yu, Yi
    Joe, Kazuki
    Downie, J. Stephen
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (06) : 1730 - 1739
  • [25] Locality-Sensitive Hashing of Soft Biometrics for Efficient Face Image Database Search and Retrieval
    Alshahrani, Ameerah Abdullah
    Jaha, Emad Sami
    [J]. ELECTRONICS, 2023, 12 (06)
  • [26] SES-LSH: Shuffle-Efficient Locality Sensitive Hashing for Distributed Similarity Search
    Li, Dongsheng
    Zhang, Wanxin
    Shen, Siqi
    Zhang, Yiming
    [J]. 2017 IEEE 24TH INTERNATIONAL CONFERENCE ON WEB SERVICES (ICWS 2017), 2017, : 822 - 827
  • [27] Anytime and Efficient Coalition Formation with Spatial and Temporal Constraints
    Capezzuto, Luca
    Tarapore, Danesh
    Ramchurn, Sarvapali
    [J]. MULTI-AGENT SYSTEMS AND AGREEMENT TECHNOLOGIES, EUMAS 2020, AT 2020, 2020, 12520 : 589 - 606
  • [28] A Fast and Memory-Efficient Spectral Library Search Algorithm Using Locality-Sensitive Hashing
    Wang, Lei
    Liu, Kaiyuan
    Li, Sujun
    Tang, Haixu
    [J]. PROTEOMICS, 2020, 20 (21-22)
  • [29] Toward more efficient locality-sensitive hashing via constructing novel hash function cluster
    Zhang, Shi
    Huang, Jin
    Xiao, Ruliang
    Du, Xin
    Gong, Ping
    Lin, Xinhong
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (20):
  • [30] Privately evaluating sensitive population record linkage without ground truth data
    Song, Jie
    Nanayakkara, Charini
    Christen, Peter
    [J]. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,