Locality Sensitive Hashing with Temporal and Spatial Constraints for Efficient Population Record Linkage

被引:1
|
作者
Nanayakkara, Charini [1 ,2 ]
Christen, Peter [1 ,2 ]
机构
[1] Australian Natl Univ, Canberra, ACT, Australia
[2] Univ Edinburgh, Scottish Ctr Adm Data Res SCADR, Edinburgh, Midlothian, Scotland
基金
英国经济与社会研究理事会;
关键词
Scalability; personal data; spatial constraint; temporal constraint;
D O I
10.1145/3511808.3557631
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Record linkage is the process of identifying which records within or across databases refer to the same entity. Min-hash based Locality Sensitive Hashing (LSH) is commonly used in record linkage as a blocking technique to reduce the number of records to be compared. However, when applied on large databases, min-hash LSH can yield highly skewed block size distributions and many redundant record pair comparisons, where only few of those correspond to true matches (records that refer to the same entity). Furthermore, min-hash LSH is highly parameter sensitive and requires trial and error to determine the optimal trade-off between blocking quality and efficiency of the record pair comparison step. In this paper, we present a novel method to improve the scalability and robustness of min-hash LSH for linking large population databases by exploiting temporal and spatial information available in personal data, and by filtering record pairs based on block sizes and min-hash similarity. Our evaluation on three real-world data sets shows that our method can improve the efficiency of record pair comparison by 75% to 99%, whereas the final average linkage precision can be improved by 28% at the cost of a reduction in the average recall by 4%.
引用
收藏
页码:4354 / 4358
页数:5
相关论文
共 41 条
  • [41] Temporal and spatial pinhole constraints in small-molecule hole transport layers for stable and efficient perovskite photovoltaics (vol 7, pg 7338, 2019)
    Niu, Xiuxiu
    Li, Nengxu
    Zhu, Cheng
    Liu, Lang
    Zhao, Yizhou
    Ge, Yang
    Chen, Yihua
    Xu, Ziqi
    Lu, Yue
    Sui, Manling
    Li, Yujing
    Tarasov, Alexey
    Goodilin, Eugene A.
    Zhou, Huanping
    Chen, Qi
    [J]. JOURNAL OF MATERIALS CHEMISTRY A, 2019, 7 (18) : 11537 - 11537