Locality Sensitive Hashing with Temporal and Spatial Constraints for Efficient Population Record Linkage

被引:1
|
作者
Nanayakkara, Charini [1 ,2 ]
Christen, Peter [1 ,2 ]
机构
[1] Australian Natl Univ, Canberra, ACT, Australia
[2] Univ Edinburgh, Scottish Ctr Adm Data Res SCADR, Edinburgh, Midlothian, Scotland
基金
英国经济与社会研究理事会;
关键词
Scalability; personal data; spatial constraint; temporal constraint;
D O I
10.1145/3511808.3557631
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Record linkage is the process of identifying which records within or across databases refer to the same entity. Min-hash based Locality Sensitive Hashing (LSH) is commonly used in record linkage as a blocking technique to reduce the number of records to be compared. However, when applied on large databases, min-hash LSH can yield highly skewed block size distributions and many redundant record pair comparisons, where only few of those correspond to true matches (records that refer to the same entity). Furthermore, min-hash LSH is highly parameter sensitive and requires trial and error to determine the optimal trade-off between blocking quality and efficiency of the record pair comparison step. In this paper, we present a novel method to improve the scalability and robustness of min-hash LSH for linking large population databases by exploiting temporal and spatial information available in personal data, and by filtering record pairs based on block sizes and min-hash similarity. Our evaluation on three real-world data sets shows that our method can improve the efficiency of record pair comparison by 75% to 99%, whereas the final average linkage precision can be improved by 28% at the cost of a reduction in the average recall by 4%.
引用
收藏
页码:4354 / 4358
页数:5
相关论文
共 41 条
  • [31] LOAD-BALANCED LOCALITY-SENSITIVE HASHING: A NEW METHOD FOR EFFICIENT NEAR DUPLICATE IMAGE DETECTION
    Fan, Yabo
    Xing, Junliang
    Hu, Weiming
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 53 - 57
  • [32] GLDH: Toward more efficient global low-density locality-sensitive hashing for high dimensions
    Li, Yiqi
    Xiao, Ruliang
    Wei, Xin
    Liu, Huakun
    Zhang, Shi
    Du, Xin
    [J]. INFORMATION SCIENCES, 2020, 533 : 43 - 59
  • [34] A Distributed Rough Set Theory Algorithm based on Locality Sensitive Hashing for an Efficient Big Data Pre-processing
    Dagdia, Zaineb Chelly
    Zarges, Christine
    Beck, Gael
    Azzag, Hanene
    Lebbah, Mustapha
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 2597 - 2606
  • [35] Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data
    Konstantin Bob
    David Teschner
    Thomas Kemmer
    David Gomez-Zepeda
    Stefan Tenzer
    Bertil Schmidt
    Andreas Hildebrandt
    [J]. BMC Bioinformatics, 23
  • [36] Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data
    Bob, Konstantin
    Teschner, David
    Kemmer, Thomas
    Gomez-Zepeda, David
    Tenzer, Stefan
    Schmidt, Bertil
    Hildebrandt, Andreas
    [J]. BMC BIOINFORMATICS, 2022, 23 (01)
  • [37] FJLT-FLSH: More Efficient Fly Locality-Sensitive Hashing Algorithm via FJLT for WMSN IoT Search
    Shao, Wenhao
    Xiao, Ruliang
    Huang, Jin
    Liu, Huakun
    Du, Xin
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2019, 6 (04) : 7122 - 7136
  • [38] Efficient Document Retrieval System using Locality Sensitive Hashing Nearest Neighbor Algorithm and Weighted Jaccard Distance for Retrieving Closest Personalities
    Ben George, E.
    Rosline, G. Jeba
    Balasupramanian, N.
    Blessing, N. R. Wilfred
    [J]. JURNAL KEJURUTERAAN, 2024, 36 (04): : 1535 - 1543
  • [39] Temporal and spatial pinhole constraints in small-molecule hole transport layers for stable and efficient perovskite photovoltaics
    Niu, Xiuxiu
    Li, Nengxu
    Zhu, Cheng
    Liu, Lang
    Zhao, Yizhou
    Ge, Yang
    Chen, Yihua
    Xu, Ziqi
    Lu, Yue
    Sui, Manling
    Li, Yujing
    Tarasov, Alexey
    Goodilin, Eugene A.
    Zhou, Huanping
    Chen, Qi
    [J]. JOURNAL OF MATERIALS CHEMISTRY A, 2019, 7 (13) : 7338 - 7346
  • [40] Temporal vs. spatial variation in stress-associated metabolites within a population of climate-sensitive small mammals
    Whipple, Ashley L.
    Ray, Chris
    Wasser, Max
    Kitchens, James N.
    Hove, Alisa A.
    Varner, Johanna
    Wilkening, Jennifer L.
    [J]. CONSERVATION PHYSIOLOGY, 2021, 9